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Sectlon  1 
Introduction 


!• 1 What  is  a network  protocol? 

The  function  of  a communications  network  i3  to  provide  a temporary 
link  between  an  information  source  (human  voice,  data  terminal,  computer) 
and  the  appropriate  destination.  A set  of  rules  are  necessary  in  order 
to  establish  and  terminate  such  a connection,  and  are  called  network 
protocols.  These  rules  generally  necessitate  control  information  to  be 
transmitted  through  the  network  in  addition  to  message  data.  The  control 
information  may  be  considered  as  u network  overhead,  and  is  called 
protocol  information. 

Examples  of  protocol  information  include  the  beginning,  the  end 
and  the  destination  of  a message,  all  of  which  are  discussed  in  thir 
work.  Other  protocols  are  associated  with  network  operation,  and 
include  routing  and  supervision, 


•2  Why  are  network  protocc  Is  important? 

A communications  network  has  finite  resources  including  channel 
bandwidth  and  buffer  storage,  with  which  to  service  potential  users. 
Overheads,  including  protocol  data,  have  an  associated  cost  relating 
to  the  bandwidth  they  occupy  and  the  transmissior  delays  that  messages 
incur  due  to  the  accompanying  control  data.  In  order  to  allocate 
network  resources  efficiently,  overheads  must  be  kept  to  a necessary 
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minimum,  and  this  research  is  directed  towards  such  an  end. 

In  addition  to  reducing  overheads  in  existing  networks,  the 
theoretical  study  of  network  protocols  can  offer  seme  insight  on 
how  to  improve  system  design,  as  will  be  illustrated  in  the  discussion 
on  addressing  information. 

3 Data  communication  networks 

Data  comnunication  networks  will  be  used  to  illustrate  the  design 
and  evaluation  of  network  protocols.  These  will  be  assumed  to  consist 
off  a finite  collection  of  nodes,  to  which  computers  are  attached,  inter- 
connected by  two  way  noiseless  channels  of  fixed  capacity.  The  nodes 
are  store  and  forward  centers  for  messages  passing  through  the  network. 

Messages  originate  at  the  computers  with  random  arrival  time* 
and  data  lengths.  The  network  capacity  will  be  assumed  sufficient  to 
ensure  that  despit?  heavy  loading,  messages  will  not  incur  delays  in 
excess  of  a specified  time  during  transmission. 

In  order  to  concentrate  on  specific  protocols  concerned  with 
message  lengths  and  destinations,  it  is  necessary  to  subdivide  the 
protocols  in  a network  into  hierarchical  levels  including: 

(i)  Process  to  process1  (programs  within  computers) 

(ii)  Host  to  host  (computer  locations) 

(iii)  nterconnecting  networks3 

(iv)  Subnetworks  or  line  protocols4 

Interest  in  this  thesis  will  be  directed  towards  the  fourth 


category  which  includes  the  transmission  of  messages  between  nodes. 

1. 4 Background  to  network  protocols 

The  theoretical  study  of  network  protocols  can  be  separated 
into  two  components,  the  derivation  of  lower  bounds  for  protocol  information, 
and  the  construction  of  coding  schemes  to  achieve  these  bounds.  Recent 
work  has  concentrated  on  the  problem  of  constructing  lower  bounds,  and 
has  used  information  theory  to  represent  protocol  information  by  a 
source  code  . Based  on  the  result*  of  this  work,  protocol  encoding 
schemes  will  be  presented  which  are  close  to  the  lower  bounds. 

The  pratical  developement  of  distributed  computer  " ~ l '-’orks 
originated  in  the  I960 's  with  the  ARP  A NETWORK1.  This  system  uses  a 
packet  switching  approach,  in  which  messages  are  subdivided  into 
packets,  each  containing  address  and  length  information.  The  packets 
are  independently  routed  through  the  network  and  assembled  at  the 
destination.  A more  recent  system  replaces  the  packet  by  a statistical 
multiplexor  technique  (as  used  by  Codex) . This  allocates  to  each 

source  sharing  a common  channel . a separate  variable  length  time  slot 
for  its  contents. 

These  two  approaches  will  be  used  to  illustrate  how  an  under- 
standing of  the  nature  of  protocol  informa*  >,n  can  suggest  systems 
which  have  practical  application  in  existing  and  future  networks. 
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1.5 


Outline  of  research 


The  following  section  will  indicate  which  results  from  information 
theory  have  application  to  the  understanding  of  protocol  information. 
Sections  3,4  and  5 present  a detailed  discussion  of  start-stop  protocols 
for  a single  source  and  receiver  communicating  over  a channel.  This 
information  allows  the  receiver  to  identify  the  beginning  and  end  of  a 
message  in  a continuous  stream  of  binary  data. 

Three  protocol  st: ategies  for  conveying  start-stop  information 
are  described  and  comparedin  section  4.  These  include  modified  examples 
of  existing  schemes  including  fixed  and  variable  length  packets,  and 
terminal  flags. 

The  concept  of  address  information  is  introduced  in  section  6, 
when  many  source/receiver  pairs  communicate  over  a single  channel. 

Two  coding  strategies  for  start-stop  and  destination  information  are 
described  and  compareo,  including  a Huffman  and  Universal  coding 
scheme. 

Finally  in  sections  6.8  and  7,  mention  is  made  of  the  application 
of  the  protocol,  strategies  to  practical  networks,  together  with  some 
comments  on  the  success  of  the  source  coding  approach. 


Section  2 


A source  coding  approach  to  protocol  information 
2.1  Introduction 

The  necessity  for  protocol  information  in  a communications  network 
is  essentially  to  resolve  * ne  statistical  uncertainties  associated  with 
incoming  messages;  including  arrival  times,  message  length  and  destina- 
tion. Information  theory  helps  derive  a lower  bound  on  such  information, 

and  suggests  in  some  cases  an  encoding  scheme  which  achieves  that 

. .4,5 

bound 

Two  concepts  from  information  theory  which  include  source  codes  and 
source  entropy  will  be  discussed  briefly,  before  passing  onto  practiced, 
coding  schemes  for  encoding  protocol  information. 

2.2  A source  code 

Consider  the  protocol  information  which  describes  message  length. 

If  the  length  is  a random  variable,  then  each  element,  a^,  belonging  to 
the  set  of  all  possible  lengths,  X,  can  b described  by  its  probability 
of  occurrence  P^(a^).  The  probability  function,  P , forms  a complete 
statistical  characterization  e nformation  source,  X.  The  protocol 

infoi mation  describing  messag  is,  to  the  information  theorist, 

a source  . 

For  any  information  source,  X,  there  is  a quantity  called  the  source 
entropy  or  self  information,  H(X).  The  entropy  represents  a lower  bound 
for  the  average  number  of  binary  digits,  n,  required  to  encode  each 
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pource  letter,  a^.  For  a source  with  K elements, 

H(X,  - jr  W log  (2.2.1) 

The  lower  bound  is  expressed  in  a source  coding  theorem  which 
states  that  for  a source,  X,  it  is  possible  to  assign  prefix  codewords  to 
the  souo.ce  letters,  a^,  in  such  a way  that  the  average  length  of  a code- 
word, n,  satisfies 

n < H (X)  + 1 (2.2.2) 

and  for  a uniquely  decodable  set  of  codewords 

n^H(X)  (2.2.3) 

The  theoretical  limit  of  (2.2.3)  can  be  approached  by  employing  efficient 

6,7 

coding  techniques  including  the  Huffman  code 
2.3  Start-stop  information 

A source  of  binary  data  can  exist  in  either  of  two  states;  the  idle 
state  during  which  it  generates  idle  characters,  and  the  busy  state  during 
which  it  generates  messages.  The  start-stop  information  need  only  express 
the  lengths  of  each  consecutive  state,  i.e.,  the  idle  and  busy  periods. 

To  appreciate  this,  consider  a receiver  which  is  informed  of  the  initial 
state  of  the  raurce,  and  the  lengths  of  all  subsequent  states.  It  will 
then  be  able  to  reconstruct  from  the  incoming  data  stream  the  idle  and 
busy  periods. 

Coding  of  start-stop  information  involves  two  independent  informa- 
tion sources;  one  belonging  to  the  idle  period,  and  one  to  the  busy 


period.  The  source  elements  are  the  different  lengths  of  the  idle  or 
busy  states.  According  to  the  inter  arrival  time  and  message  length 
statistics,  the  entropy  of  the  sources  can  be  calculated  using  (2.2.1), 
and  an  efficient  coding  scheme  designed  to  meet  bounds  (2.2.2&3). 

Poisson  arrivals  and  geometric  length  statistics  will  be  assumed 
in  the  discussion  of  start-stop  information  which  follows. 


Section  3 


Start-Stop  Protocols 
3»1  Single  source/receiver  model 

The  model  proposed  to  investigate  start-stop  protocols  consists 
of  a single  data  source  communicating  with  a receiver  through  a fixed 
capacity  (one  digit/second)  channel.  Between  the  source  and  the  channel 
is  placed  a node  (data  processor)  acting  as  a buffer  for  incoming 

messages,  and  able  to  generate  protocol  information  necessary  for  communi 
cation  (Pig.  3.1). 

The  source  generates  messages  with  interarrival  times  modelled  by 
the  poisson  process,  and  lengths  described  by  a geometric  probability  dis- 
tribution. Each  message,  upon  arrival  from  the  source,  joins  a queue  at 
the  source  node. 

3 . 2 Encoding  start:  information 

To  appreciate  the  significance  of  start-stop  information,  consider 
the  above  system  in  an  idle  state  (i.e.,  the  source  node  contains  no 
messages).  After  a random  interval  of  time,  a message  is  generated  by 
the  source  and  joins  the  empty  queue,  awaiting  transmission.  The  node 
must  communicate  the  change  of  state  to  the  receiver  before  transmitting 
data. 

It  is  nt  t.  possible  to  predetermine  the  length  of  the  idle  period 
until  the  next  arriva]  occurs,  so  the  source  must  send  frequent  state 
information  to  the  receiver.  Any  attempt  to  encode  this  information  into 
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a reduced  font  will  result  in  probable  message  delays. 

For  example,  consider  sending  only  one  idle  character  (say  a 0) 
for  every  L seconds  spent  in  the  idle  state . Should  a message  arrive 
during  the  intervening  period,  a delay  of  up  to  L seconds  will  be  in- 
curred befor a the  receiver  is  informed  of  the  change  of  state. 

Tto  avoid  such  a delay,  the  following  strategy  is  an  obvious 
choice.  Daring  the  idle  period,  idle  characters  (for  example,  binary 
zeros)  are  transmitted  every  second.  On  arrival  of  a message  to  the 
source  node,  a busy  character  is  transmitted  (say  a binary  one) . This 
strategy  ensures  minimum  delay  for  message  data  at  the  expense  of  a less 
efficient  encoding  of  idle  characters. 

Accepting  the  idle  characters  as  a necessary  cost  ior  avoiding 
delays,  the  single  start  bit  (indicating  the  transition  to  the  busy 
state)  must  be  included  as  part  of  the  start  protocol  information  accom- 
panying each  message. 

3.3  Encoding  stop  information 

Once  the  source  node  enters  the  busy  state,  it  remains  there  for 
at  least  one  message.  Protocol  information  must  be  sent  to  the  receiver 
to  indicate  the  end  of  the  message.  It  is  sufficient  to  send  an  encoding 
of  the  massage  length  itself  as  stop  information.  An  efficient  coding 
scheme  exists  for  this  purpose,  and  is  discussed  as  a possible  optimal 
strategy  for  stop  protocol  information. 

Two  other  approaches  exist  in  practical  networks,  and  are  com- 
pared in  efficiency  with  the  proposed  optimal  scheme.  The  first  employs 
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packets  of  fixed  length,  and  the  second  places  a unique  flag  at  the  end 
of  the  message  data.  The  method  of  comparison  is  based  on  the  estimation 
of  mean  message  delays  at  the  source  node.  An  M/G/l  queueing  process  is 
used  to  analyze  the  model  on  accoun*  of  the  special  source  statistics. 

On  completed  transmission  of  a message,  the  receiver  awaits  state 
information  from  the  source  before  accepting  a following  message. 

3.4  M/G/l  Queues 

Messages  arriving  at  the  sourc?  node  form  a Poisson  input  queue 
with  mean  arrival  rate  o'  X.  An  addition  of  S bits  of  protocol  informa- 
tion is  made  to  each  message  of  length  M,  generating  a combined  block 
length  of  B bits. 


B « M + S 


(3.4.1) 


The  channel,  considered  as  a server,  takes  B seconds  to  transmit 
each  message  in  the  queue,  and  has  an  arbitrary  service  rate  of  E(B)"1 
seconds  , dependent  on  the  protocol  strategy  used  to  genera*-''  S.  The 

queueing  process  is  therefore  described  by  an  M/G/l  model;  poisson  input 
and  general  service  time. 

The  Pollcaczek-Xhintchxne  formula  provides  a value  for  expected 
system  s:ze,  F (Q) , that  is  the  number  of  messages  in  the  queue  an  1 in 
service,  in  terms  of  traffic  intensity,  p,  arrival  rate,  X,  and  variance 

g 

of  service  time,  var(B). 


p = X E(B)  < 1 


e(q)  = p + yar*B>. 

2 1-n 


(3.4.2) 


(3.4.3) 


The  expected  system  size,  E(Q) , may  be  expressed  in  terms  of  the 
first  and  second  moments  of  service  time,  by  expanding  the  variance  of 
(B)  . 

_ 2 2 

(3.4.4) 


2 2 
A E(B  ) 

E(Q)  - A E(B)  + 2(1_Ae(B)T 


Tie  expected  waiting  time,  E(W),  in  the  queue  and  in  service,  can 
be  obtained  by  Little's  formula. 


E(W)  - E(Q)/A 


(3.4.5) 


The  dependence  of  waiting  time  on  first  and  second  moments  of 
block  length  carries  some  implications  for  an  efficient  protocol  strategy 
It  should  employ  a minimal  average  number  of  bits,  E(S),  and  have  a small 
variance  associated  with  this  mean.  Mean  waiting  time  is  given  by: 


E (W)  = E(B)  + 


, 2, 

AE(B  ) 


2 (1-AE  (B) ) 


(3.4.6) 


3.5  Minimizing  protocol  information 

Associated  with  all  protocol  strategies  discussed  herein,  there 
appear i an  independent  parameter,  L,  associated  with  the  function,  S. 

For  example,  L is  the  length  of  a packet  in  the  fixed  packet  strategy. 
Block  length  is  dependent  on  L trough  variable  S, 

B(L)  = M + S(L  (3.5.1) 

To  ensure  a meaningful  comparison  between  queue  delays  in  differ- 
ent protocol  strategies,  it  is  necessary  to  optimize  the  value  of  L.  A 
suitable  criterion  for  optimization,  is  to  choose  L to  minimize  expected 
block  length,  or  equivalently  mean  protocol  information,  S(L). 


dE\3) 

dL 


L-L 


- 0 


dE(M) 

dL 


L-L 


+ 


dE(S(L) ) 
dL 


L-L 


(3.5.2) 


The  moments  of  service  time,  B,  can  then  be  calculated  in  terms  of 
L° , al1 owing  a meaningful  comparison  of  E(W)  between  strategies. 


3.6  Summary  of  the  analytical  technique 

It  is  proposed  to  compare  three  protocol  strategies  for  transmit- 
ting start-stop  information  across  a single  channel,  by  calculating  mean 
waiting  time  o^  messages  arriving  at  the  source  node.  Waiting  time  is 
of  practical  significance  in  data  networks  and  is  a useful  indication  of 
the  efficiency  of  a protocol  scheme. 

In  order  to  calculate  waiting  times,  the  first  and  second  moments 
of  block  length,  B,  must  be  derived  (block  length,  B,  includes  both 
message  and  protocol  data) . 

The  three  protocol  strategies  of  interest  are  the  fixed  packet 
strategy  in  which  messages  are  subdivided  into  fixed  length  sections, 
the  terminal  flag  strategy,  and  a scheme  based  on  Huffman  encoding.  The 
last  strategy  corresponds  to  a variable  length  packet  approach,  and  has 
similarities  t'>  schemes  implemented  in  packet  switched  networks. 


Section  4 


Three  start-stop  protocol  strategies 

4.1  Introduction 

In  the  previous  section,  a single  source/receiver  model  was  pro— 
posed  on  which  to  evaluate  different  start-stop  protocols,  together  with 
a criterion  for  assessing  their  relative  efficiencies  in  terms  of 
queueing  delay.  This  section  will  examine  three  different  strategies 
all  of  which  have  practical  counterparts,  although  in  somewhat  modified 
forms.  One  strategy,  the  Huffman  encoding  of  length,  is  based  on  source 
coding  ideas. 

The  comparison  between  the  three  strategies  is  intended  to  illus- 
trate the  performance  of  schemes  devised  as  practical  solutions  in  oper- 
ating networks  against  a theoretical  solution  advanced  in  the  thesis. 


4.2  Fixed  length  packet  strategy 

Each  message  is  transmitted  in  a sequence  of  fixed  length  packets, 

of  L bits.  For  a message  of  M bits,  the  number  of  packets  employed,  N, 
is  given  by: 


The  message  length,  a random  variable,  is  not  usually  an  integer 
multiple  of  L,  causing  redundancy  in  the  last  fixed  packet  of  R bits. 

A length  specifier,  placed  at  the  end  of  the  message,  encodes  the  useful 

* The  integer  value  greater  or  equal  to  (M/L) 
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number  of  message  bits,  (L-R) , in  the  last  packet,  if  l is  an  integer 
power  of  2 , then  a fixed  codeword  of  length  log^L  is  sufficient  to 
encode  these  bits.  Otherwise  a set  of  variable  length  words,  some  shorter 
and  sane  longer  than  log^L  but  having  the  same  mean  value,  mu3t  be  employed. 

Each  packet  is  preceded  by  a busy  bit,  or  binary  one,  to  indicate 
the  arrival  of  another  packet.  A binary  zero  is  placed  inbe tween  the 
last  packet  and  length  specifier  to  indicate  the  end  of  the  packet  se- 
quence. The  receiver  can  then  identify  the  following  variable  length 
codeword,  and  subsequently  return  to  the  idle  state  (Fig.  4.1). 

4.3  Block  length  statistics 

Define  the  sequence  of  data  which  includes  both  the  message  and 
protocol  data  as  a single  block,  of  length  B bits.  Some  statistics  of  B 
must  be  derived  in  order  to  determine  queueing  delays  (see  equation 
3.4.6). 

The  number  of  packets,  N,  has  a geometric  probability  distribution 
(see  Appendix  4a).  The  protocol  information  in  each  block  is  coded  into 
S bits  which  include  N busy  bits,  a length  specifier  with  a zero  pre- 
ceding it,  and  R bits  of  redundant  data  ii.  the  final  packet 

£(L)  = N + log^L  + 1 H R (4.3.1) 

where  P ~ NL  - M (4.3.2) 

From  (3.5.1) 

B(L)  = (L+1)N  + log^L  + 1 (4.3.3) 

The  first  moment  of  B is  obtained  from  (4.3.3),  with  L as  a fixed 
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E(B)  « (L+1)E(N)  + log2L  + 1 (4.3.4) 

In  Appendix  4b,  the  parameter  L is  chosen  to  minimize  the  expected 

o 

block  length,  E(B),  according  to  (3.5.2).  The  optimum  packet  length,  L , 

and  number  of  packets,  E(N),  were  found  to  be; 

1 _ 1 

L°  - (2e(M) ) 2 - 1.782  + 0(E(M)  7)  (4.3.5) 

E(N)  * — (Appendix  4a) 

1_a  L-L° 

1 i -7 

- (i£(M))  + 1.39  + OiE(M)  ) (4.3.6) 

In  data  networks,  typical  message  lengths  are  confined  to  the 

range  10  < E(M)  < 105.  It  is  thus  reasonable  to  neglect  terms  of  order 
1 

E(M)~  7 and  below.  Table  (1)  confirms  the  following  approximations  to 
be  acceptable  for  L°  and  E(N)  (as  given  in  (4.3.5),  (4.3.6)),  in  the 
range  10  < E(M)  < 105, 

1 

L°  « (2E(M) ) 2 - 1.782  (4.3.7) 

_1 

E (N)  « ( — ■ E (M) ) 2 + 1.39  (4.3.8) 

Values  for  tne  first  and  second  moments  of  B can  now  be  obtained 
as  functions  of  E (M)  alone.  Consider  the  square  of  block  length,  B,  as 
given  in  (4.3.3).  The  expected  value  of  this  expression  is  the  second 
moment; 

E(B2)  = (L+1)2E(N2J  + 2(L+1)E(N) (log2L  + 1) 

+ (log2L  + l)2  (4.3.9) 

2 2 

From  Appendix  4a,  E(N  ) = 2e(N)  - E(N) 
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Substituting  values  of  L°  and  E(N)  found  above,  into  the  moments,  E(B) 

and  E(B2)  given  in  (4.3.4)  and  (4.3.9); 

1_  1 

E(D)  « E(M)  + 1.43  M)2  + log2(2E(M))2  (4.3.10) 

1 

E(B2)  W 2E(M)2  + E(M)^4.3E(M)2  + 0.47 

1 

+ 2 log2  ( (2e(M)  ) 2 - l£)  ) 

1 1 

+ E(M)2(2.86  log2 ( (2e(M) ) 2 - 1.78)  - 0.74) 

1 

- 2.2  log2((2E(M))2  - 1.78)  - 2.2 

1 

+ (1  + log2((2ElM))2  - 1.78)) 2 (4.3.11) 

Although  it  has  been  necessary  to  approximate  some  of  the  coeffi- 
cients in  the  above  expressions,  the  functional  relationships  have  been 

preserved  sufficiently  well  to  illustrate  later  that  this  strategy  has 

2 

considerably  larger  moments,  E(B)  and  E(B  ),  than  the  other  strategies. 

4.4  Concluding  remarks  for  the  fixed  length  strategy 

The  major  inefficiency  in  this  strategy  is  the  redundancy  in  the 
last  packet.  By  removing  the  necessity  for  fixed  packet  length  on  the 
final  packet,  and  rearranging  the  length  specifier,  this  redundancy  could 
be  avoided.  Such  an  observation  suggested  a strategy  which  corresponds 
to  the  Huffman  encoding  of  message  length,  as  will  be  seen  in  the  next 


paragraphs. 
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4.5  Huffman  length  encoding  strategy 

By  employing  a direct  encoding  of  message  length,  Mt  to  supply 
the  stop  information  for  each  message  block,  B,  an  efficient  protocol 
®hrategy  can  ho  achieved.  The  integer,  M,  is  assumed  to  have  a geometric 
probability  distribution  and  can  ba  encoded  by  a Huffman  coding  proce- 
dure , to  give  an  average  word  length,  n , which  exceeds  the  source 

s 

entropy,  H(S),  by  an  average  of  0.03  bits. 

The  entropy  of  message  content  and  length,  H(S),  may  be  expressed 
as  a function  of  the  mean  length  of  a busy  period,  1/e. 

H(S)  * 1/e  + 1/e  T/(e)  bits/message  (4.5.1) 

where  W(x)  = -x  log(x)  - (i-x) log (1-x) 

The  first  term  contains  the  entropy  of  message  content,  and  the 

second  the  entropy  of  message  length  or  stop  information.  The  Huffman 

encoding  of  length  achieves  an  average  redundancy  of  0.03  bits  above  the 

source  entropy.  Average  codeword  length  of  stop  information  encoded  by 

the  Huffman  scheme  is  n , where 

s 

ng  = 1/e  W(e)  + 0.03  bits/message  (4.5.2) 

The  binary  encoding  of  message  data  achieves  on  average  E (M)  bits/ 
message,  by  definition,  and  thus 

E^M)  * 1/e  = — bits/message  (4.5.3) 

The  '■tart  information  is  coded  in  the  same  manner  as  the  fixed 
packet  strategy,  with  binary  zeros  transmitted  during  the  idle  period, 
and  a binary  one  transmitted  to  indicate  the  beginning  of  a busy  period. 
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As  no  information  about  the  length  of  the  idle  period  is  available  in 
advance,  like  the  length  of  messages  (which  arrive  instantaneously) , no 
economy  in  coding  can  be  achieved. 

Although  the  separate  encoding  of  length  and  message  information 
closely  approaches  the  sum  of  the  first  two  terms  in  (4.5.1),  an. improve- 
ment can  be  made  by  using  a joint  encoding  scheme.  The  constriiction  of 
a Huffman  code  for  the  joint  alphabet  is  particularly  difficult,  and  has 
not  been  performed  because  of  the  insignificant  saving  of  a fraction  of 
a bit,  i.e.,  under  the  present  scheme  of  separate  encoding,  only  0.03 
bits  are  wasted  on  average.  The  separate  and  joint  source  trees  arc 
illustrated  in  (Fig.  4.2). 

The  codeword  specifying  message  length  is  constructed  as  follows. 


Let  L be  the  integer  which  satisfies  the  inequality: 


L L+l  ^ L L-l 
a + a <l<a  + a 


(4.5.4) 


The  mean  message  length,  E(M),  falls  in  the  range  10  < E(M)  <10  in 
most  networks.  An  approximate  value  for  aL  may  be  found  under  this 
assumption;  where  a is  defined  in  (4.5.3) , 


aL  - j + 0 (E  (M) _1)  « - 


(4.5.5) 


The  integer,  M,  may  be  represented  by  the  expression: 


M = (N-l)L  + [M]mod (L) 


(4.5.6) 


The  integer,  N,  is  defined  by  equation  (4.2.1).  The  length 
encoding  becomes  the  concatenation  of  a unary  code  of  N-l  binary  ones 
followed  by  a zero,  and  a variable  length  codeword,  of  length  log^L, 
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which  encodes  [M]mod(L). 

4.6  Implementing  the  length  encoding  strategy 

The  similarity  between  the  fixed  length  strategy  and  the  one 
described  above  becomes  apparent  when  the  Huffman  code  is  implemented  in 
the  following  way. 

A message  of  length  M bits  is  decomposed  into  N-l  packets  cf 
length  L#  and  a final  packet  of  length  less  than  L.  A busy  bit  is  trans- 
mitted in  front  of  each  of  the  first  N-l  packets  (a  binary  one,  correspon- 
ding to  the  unary  code  in  the  Huffman  scheme) . A binary  zero  is  placed 
after  these  packets  to  indicate  the  arrival  of  the  length  specifier,  which 
encodes  the  number  of  bits  in  the  final  packet.  The  remaining  message 
bits,  [Mlmod (L) , follow  the  length  specifier  (see  Fig.  4.3).  The  set  of 
N-l  busy  bits  and  the  length  specifier  are  equivalent  to  the  two  words  in 
the  Huffman  scheme,  although  they  are  planed  apart. 

A start  bit  is  placed  in  front  of  the  first  busy  bit  in  order  to 
avoid  confusion  when  only  one  packet  is  transmitted  in  a message  (i.e., 

M < L) , and  no  busy  bit  is  included  before  the  length  specifier. 

Tiie  protocol  information,  S(L),  includes  a start  bit,  N-l  busy 
bits  ('l's) , a *0'  placed  before  the  length  specifier,  and  the  specifier, 
length  log2 (L) 

S (Li)  = 1 + (N-l)  + 1 + log2  (L)  (4.6.1) 

The  total  length  of  the  message  and  protocol  information,  B, 

becomes 

B«m+S=M+N+1+  log2(L) 


(4.6.2) 
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The  expected  value  of  integer,  N,  is  obtained  from  the  approxima- 
tion, (4.5.5)  and  equation  (A4)  from  Appendix  (4a). 


-1* 


:(N)  - l/(l-a  ) - 2.0  + 0 (E (M)  ) « 2.0 


(4.6.3) 


The  expected  block  length,  E(B)  becomes 

E(B)  = E (M)  + E(N)  + 1 + log2 (L)  (4.6.4) 

The  optimum  value  for  L is  obtained  from  the  constraint  imposed 
upon  aL,  (4.5.4).  From  (4.5.3)  and  (4.5.5), 

L°  « ln(2)E(M)  + OtEtM)*1)  (4.6.5) 

The  expected  block  lenjth,  E (B) , given  in  (4.6.4)  can  be  expressed 
in  terras  of  E(M) 

E(b)  = E (M)  + 3.0  + log2 (E(M)ln2)  + 0(E(M)  1) 

SE(M)  + 3.0  + log-  (E(M)ln2)  (4.6.6) 


4.7  Optimality  of  the  Huffman  scheme 

The  Huffman  encoding  of  length  information  provides  a set  of  code 
words  whose  average  length  is  close  to  the  source  entropy  (0.03  bits 
larger  than  H(S)K  From  information  theory,  the  Huffman  coding  is  more 
efficient  in  this  sense  than  other  coding  schemes  which  can  be  devised. 
If  the  objective  of  a protocol  strategy  is  to  minimize  the  average  over- 
head in  a network,  the  Huffman  scheme  will  satisfy  this  condition. 

A more  practical  measure  of  protocol  coding  efficiency  in  a net- 
work is  the  transmission  delay  incurred  by  messages  operating  under  a 
specific  protocol.  Transmission  delay  in  data  communication  networks  is 
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re  1 at  ed  to  queueing  tine  at  nodes,  which  have  been  shown  to  depend  both 
on  first  and  second  moments  of  block  length,  B (Eq.  3.4.6). 

4»8  Second  moment  of  block  length,  E(B  ) 

Squaring  the  value  for  B,  given  in  (4.6.2),  and  taking  the  mean, 

2 2 -y 

E(B  ) - E(M+N)  + 2E (M+N)  (log2L  + 1)  + (log2L  +1) 

(4.8.1) 

In  order  to  evaluate  this  expression,  the  joint  moments  of  (M+N) 
must  be  derived.  This  has  been  done  in  Appendix  (4c)  in  terms  of  a 
new  random  variable,  C,  where 

c “ M + N (4.8.2) 

The  moment  generating  function  of  C allows  the  moments  E(C)  and 

2 

E(C  ) to  be  derived  as  follows  (see  Appendix  4c) : 

E (C)  = E (M+N)  = E(M)  + E(N)  (C5) 

2 2 2 2 
E(C  ) = E (M+N)  = 2E  (M)  + 2E  (N)  + 2E(M)E(N)  - E(M) 

- E (N)  + 2L.var(N)  (C13) 

Choosing  the  parameter  L,  as  in  (4.6.5),  and  the  corresponding 
value  of  E(N) , as  in  (4.6.3),  the  above  moments  may  be  expressed  in 
terms  of  E (M)  alone. 

For  L°  ■ E(M)ln2  and  E (N)  = 2.0 

then  E (M+N)  = E(M)  + 2.0  (4.6.3) 

2 2 

and  E (M+N)  « 2E  (M)  + 5.7724e(M)  + 6.0  (4.8.4) 

2 

The  second  moment,  E(B  ),  can  now  be  expressed  as  a function  of 
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E(M), 


E(B2)  «2E2(MJ  + E(M) (7.7724  + 2 log^  (E(M)  ln2) ) 

+ 4 log2(E(M)li»2>  + (log2(S(M)ln2)  + l}2  + 10.0 

(4.8.5) 


4.9  Terminating  flag  strategy 

A unique  bit  pattern  of  r+1  digits  (a  flag)  is  used  to  indicate  the 
end  of  a message.  When  the  receiver  identifies  the  flag,  it  assumes  that 
transmission  of  the  current  data  is  complete,  and  awaits  either  a nt ; 
message  or  idle  bits.  To  prevent  premature  terminations,  the  source  must 
recognize  and  modify  any  r bit  pattern  which  is  identical  to  the  first 
r bits  of  the  flag.  The  encoding  consists  of  an  insertion  of  a single 
bit  after  the  pattern,  which  is  complementary  to  the  r+1  flag  bit  (see 
Fig.  4.4). 

The  receive^  is  constartly  looking  for  the  flag  pattern.  On 
receiving  the  first  r of  these  bits,  it  inspects  the  following  bit.  If 
it  is  identical  to  the  final  flag  bit,  the  receiver  terminates  the  mes- 
sage. If  the  two  are  different,  the  receiver  deletes  the  it  from  the 
message,  and  continues  to  accept  the  incoming  message. 

The  first  r bits  of  the  flag  are  referred  to  as  a recurrence 
pattern.  The  insertions  caused  by  the  occurrence  of  this  pattern  in  the 
message  constitute  a component  of  the  length  protocol  information. 

To  illustrate  this  strategy,  consider  the  flag  of  a zero  followed 
by  r ones.  If  the  source  identifies  a zero  followed  by  r-1  ones  in  the 
message  it  inserts  a zero  after  the  pattern.  When  the  receiver  iden- 
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Message  (m  bits) 
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Transmitted  block  (message  with  protocol) 
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y t Liin 
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idle  start 

bits  bit 
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data 


/ 

r+1  bit 
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IBM  Message  Format  (synchronous  data  link  control) 


. 0 0 Ojflar  (Readdress  (8  ^control  (R^Data  ^rrornflag  (R)| 

i r 1 

start  stop 


information 


information 


The  Flag  strategy 


Figure  4.4 
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tifies  the  same  r bit  pattern,  it  inspects  the  next  A>it.  If  this  is  a 
zero,  it  assumes  an  insertion  which  can  be  deleted.  If  it  is  a one,  it 
asstunes  the  end  of  a message. 

Flag  pattern:  e.’ Ill 11 

Message  data:  M^M_.0111‘  ..... 

with  insertion. 


4.10  Choosing  an  optimal  flag  pattern 


A flag  pattern  must  be  found  which  minimizes  the  mean  number  of 
protocol  bits,  E(S),  whilst  conforming  to  the  strategy  described  earlier. 
The  protocol  data  is  related  to  the  number  of  insertions,  I,  whose  mean 
depends  on  the  likelihood  of  the  r bit  recurrence  pattern.  The  flag  may 
be  constructed  from  two  classes  of  recurrence  pattern,  each  with  a dif- 
ferent probability  of  occurrence, 

(1)  Identification  of  the  recurrence  pattern  does  not  depend  on 
preceding  message  bits.  An  example  of  such  a pattern  is  the  earlier 
flag  pattern  (4.9)  of  0111. ..11.  The  probability  of  occurrence,  p , of 
the  first  r digits  of  this  pattern  within  the  message  is 

P.  = (|)r  (4.10.1) 

p^  is  also  the  probability  of  an  insertion. 

(2)  Identification  of  the  recurrence  pattern  does  depend  on  pre- 
ceding message  data.  Consider  the  success  run  of  r ones  occurring  at 
the  Mj+r  bit  in  a nvsssage: 


111 


111 
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The  event  is  conditioned  on  the  value  of  M ^ . It  will  occur  only 
if  is  a zero,  or  itsei;  the  last  bit  of  an  earlier  r bit  success  run. 
The  probability  of  a success  run  at  the  Mj+r  bit  is  p^»  where,  in  the 
limit  as  j «, 


Ar+1 

r 

3 km  ■ ■ 

2 i-(f)r 


(4.10.2) 


Under  the  flag  strategy  outlined  in  section  4.9,  both  the  flag 
pattern  of  r+1  bits  and  the  recurrence  pattern  must  be  uniquely  distin- 
guishable to  the  receiver.  Any  such  patterns  which  depend  on  preceding 
message  data  are  unsuitiable  candidates  for  this  strategy.  For  example 
consider  a recurrence  pattern  of  r binary  ones,  and  a flag  of  r binary 
ones  followed  be  a*  zero,  1111.... 10.  If  the  final  bit  of  message  data  is 
a binary  one,  the  receiver  will  falsely  recognize  an  insertion  in  the 
second  to  last  bit  of  the  flag#  i.e.,  it  will  count  r binary  ones  fol- 
lowed by  another  one,  indicating  an  insertion. 

Although  pattern  (2)  above  is  less  likely  than  pattern  (1)  and 
would  thus  have  a lower  average  number  of  insertions  associated  with  the 
recurrence  pattern,  it  does  not  give  unique  decoding,  as  illustrated  in 
the  previous  example.  Only  patterns  of  the  first  category  are  suitable 
for  the  flag  strategy,  with  a probability  of  insert  ion  j;  . 

The  flag  strategy  is  currently  adopted  by  IBM  in  their  Synchronous 
9 

Data  Link  Control  . The  SDLC  line  protocol  places  both  messages  and 
control  data  in  similar  blocks,  or  frames,  whose  format  is  shown  in 
Fig.  4.4.  The  flag  (of  8 bits)  consists  of  a 01111110  pattern,  and  an 
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insertion  is  made  into  the  message  stream  if  the  source  finds  five  conse- 
cutive binary  ci.ss  in  the  outgoing  data.  The  insertion  is  a binary  zero. 
If  the  receiver  finds  five  consecutive  ones  followed  by  a zero,  it 
deletes  the  last  bit.  If  however  it  finds  six  consecutive  ones,  then 
it  recognizes  the  flag  pattern. 

For  example,  a message  contains  data  M^MjOlllllM^M^.  An  insertion 
is  made  of  a binary  zero  in  bit  position  M^.  Ihe  receiver  counts  five 
binary  ones,  i.e.,  M^M^OlllllOM^,  and  deletes  the  next  zero.  If  is 
the  last  bit  of  a message,  the  source  sends  a flag  after  M^,  i.e., 
ML01111110.  The  receiver  then  counts  six  consecutive  ones,  and  terminates 
the  message.  The  last  binary  zero  in  the  flag  is  quite  unnecessary 
because  the  0111111  pattern  is  uniqiely  identifiable  alone.  Mention  will 
be  made  in  the  next  section  about  how  large  an  optimal  flag  pattern  must 
be. 


4.11  First  moment  of  block  length,  E(B) 

The  protocol  data,  S,  for  the  flag  strategy  consists  of  a flag  of 
r+1  bits,  a start  bit,  and  I insertions,  where 

S = r+1  +1+1  (4.11.1) 

and  the  expected  block  length,  E(B)  = E(M)  -f  r.(F) , is 

E(B)  = E(M)  + E(I)  + r+2  (4.11.2) 

An  insertion  is  made  in  the  message  data  when  am  r bit  pattern 
occurs,  which  is  identical  to  the  first  r bits  of  the  flag  pattern.  The 
probability  of  such  am  event,  P(I),  is  from  (4.10.1), 
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p(i>  - (y)r 


(4.11.3) 


A message  of  length  E(M)  bits  can  only  have  Insertions  in  E(M)-r 


positions,  each  with  probability  P(I).  The  mean  number  of  insertions, 
E(I) , is  the  sum  of  the  expectations  of  an  insertion  at  each  possible 


location,  and  is  given  by 


E(I)  - 


E(M)-r 


(4.11.4) 


To  obtain  a sU.nimum  block  length  whilst  employing  the  flag 


strategy,  the  value  of  r+1  may  be  chosen  as 


r+1  «=  |"log2(E(M)  ln2)"| 


(4.11.5) 


This  result  is  derived  by  differentiating  E(B)  with  respect  to  r,  and 
equating  to  zero.  The  resemblance  of  the  flag  strategy  to  the  Huffman 


encoding  becomes  apparent  when  the  flag  is  compared  to  the  length  speci- 
fier. 


A further  constraint  is  made  upon  E(M)ln2  in  order  to  simplify 
comparison  between  strategies.  For  E(M)ln2  an  integer  power  of  2,  the 
expected  insertions,  E(I),  becomes 


2 log2(E(M)ln2) 

E(T)  “ ln2  E ( M ) 1 n 2 + E(M)ln2. 


(4.11.6) 


Neglecting  ends  effects  which  are  negligible  for  E(M)  >>  r 


E(I)  « 2.886  ; E (M)  > r 


(4.11.7) 


E (B)  « E (M)  + log  (E(M)ln2)  + 3.886 


(4.11.8) 


| 


The  terms  omitted  in  (4.11.8)  are  of  order  log (E (M) ln2) /E (M)  and 
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1/E(M).  For  large  values  of  E(M),  these  do  not  contribute  significantly 
to  the  mean  value,  E(B). 

4.12  Second  moment  of  block  length,  E(B  ) 

The  second  moment  of  B is  complicated  by  the  presence  of  M+I  terms 

whose  statistics  are  not  independent.  In  Appendix  4d  the  moment  gener- 

2 

ating  function  of  this  sum  is  derived,  and  E(M+I),  E(M+I)  is  calculated. 

E (M+I)  « E(M)  + E(I)  (4.12.1) 

E (M+I) 2 * 2E2(M)  - E(M)  4-  2E2(I)  - E(I) 

+ 2 (r-1) E(I)  + 4E(M)E(I)  (4.12.2) 

The  moments  of  I are  also  obtained  in  Appendix  4d,  where 

E(I)  « 2.886  , E(M)  » r (4.12.3) 

E(I2)  = E(I)  + (j)r_1(i  • 1/E (M) r' 

- 2.886  (1  + -2-  (1  - 0 (logE (M) /E (M) ) 
im 

E(I2)  - 19.542  + 0 (logE (M)/E (M) ) (4.12.4) 

The  second  moment  of  block  length  can  now  be  evaluated  as  a func-- 
tion  of  E(M);  from  (4.11.2) 

E(B2)  = E (M+I) 2 + 2E (M+I) (r+2)  ♦ (r+2)2  (4.12.5) 

Inserting  the  moments  of  (M+I)  given  in  (4.12.1)  and  (4.12.2), 

E(B2)  - 2E2(M)  + E(M) (3  + 2r  + 4E(l)) 

+ 2E2(I)  + E(I) (1  + 4r)  + (r+2)2  (4.12.6) 


where  E(I)  is  given  in  (4.12.3). 


Appendix  4a 


Probability  distribution  of  N 

The  number  of  packets  required  to  transmit  a message  of  length  M 
bits,  is  given  by  t le  random  variable,  N; 


The  probability  distribution  of  variable  M is  geometric, 

PM^  * (l-a)am  1 t m ■>  1 (Al) 

With  a mean  value,  E(M); 

E(M)  - (1-a)”1  (a2) 

The  definition  of  N in  (Al)  may  be  rewritten  as 
(N-l)L  < M < NL 

which  gives  a probability  mass  function 
PN(n)  = Pr ( (n-l)L  < M < nL) 


PN(n)  = (l-aL)a(n"1)L  , n > 1 (A3) 

The  moments  of  N are  simply  calculated  from  (A3); 

E(N)  = (l-aL)_1  (a4) 

2 2 

E (K  ) = 2E  (N)  - E (N)  (a5) 

and  Var(N)  = E2(N)  - E (N)  (A6) 
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Appendix  4b 


Optimal  packet  length,  L°f  for  the 
fixed  packet  strategy 

An  expression  was  obtained  in  section  (4.3)  for  the  expected  block 
length  of  a message  together  with  its  protocol  information.  The. block 
length  was  found  to  be  functionally  related  to  expected  message  length, 
E(M),  through  variable  a,  and  also  to  packet  length,  L.  It  is  possible 
to  minimize  block  length  with  respect  to  L,  as  described  in  section 
(3.5).  An  exact  relationship  between  L and  E(M)  is  difficult  to  obtain; 
however  a useful  approximation  can  be  made  and  tested. 

The  block  length  of  a single  message  is  given  by  equation  (4.3.4). 
The  expected  number  of  packets,  E(N),  employed  in  one  message  is  derived 
in  Appendix  (4a).  Taking  the  first  derivative  of  E(B),  and  equating 
to  zero  to  find  the  minimum  value: 

SElSL  . 0 . a)  * _i + _1 ,B1) 

dL  L. 2 + L,  + Lin (2) 

(1-a  ) (1-a  ) 

Defining  an  additional  variable,  x,  to  obtain  a paramet  1 r~  equation 
pair  between  L and  a;  from  Appendix  (4a' 

a = 1 - 1/E (M)  (B2) 

L 

x = -L.ln(a)  or  a = e (B3) 

At  the  minimum  of  E(B),  as  given  in  (Bl) , 
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(In  (a)  -x)  e~ 
(l-e“x) 


0 « IJ,“  1<J,/  C + 1 + -^.4 

2 l-e~x  xln(2) 


In  (a) 


Multiplying  the  last  expression  by  eX(l-e’*X)2 


o - x„(«,  - x ♦ ,*-i . 

xln(2) 


which  simplifies  into  an  expression  for  -ln(a)» 
-In  (a)  - - 


x , 

e -1-x 


l-(eX+e“x-2)/xln(2) 


(B4) 


By  expanding  terms  in  eX  and  e”X,  it  is  possible  to  obtain  an 


approximate  result  for  -In (a)  as  a series  of  decreasing  terms  in  x”j 
x being  less  than  unity. 

2 


k 

h 1 1 

+ x4  T1  + 1 

6 

21n (2) 

[24  61n(2)_ 

') 


4 5 

Ignoring  terms  in  x , x and  higher  powers,  the  following  approxi- 


mation can  be  made  for  In (a): 
2 


-ln<a)=§— . x3 


(B5) 


This  approximate  expression  gives  a value  for  x which  may  be  sub- 


stituted bach  into  (B3)  to  obtain  L. 
-21n (a) 


3 


1+x (1/3+1/ln (2) 

~ v/“21nla)  f[l  - x/-21n(a)  (l/6+l/21n  (2) ) ]1 


J 


) 

i 


Si 

I 


(B6) 
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-In(a) 


«(2E(M))  - 1.782  + 0 (E  (M)  ) 


(B7) 


Table  (1)  evaluates  (B4)  directly  to  gain  an  accurate  numerical 
correspondence  between  parameter,  x,  and  L°.  The  approximate  expression 
for  L°  is  also  evaluated  for  the  same  values  of  x,  and  listed  beside 
the  results  achieved  without  approximation.  In  the  range  10  < E(M)  < 105, 
the  correspondence  is  close,  especially  as  E(M)  grows  larger.  Beyond 
this  range,  the  higher  powers  of  x are  negligible,  and  improve  the 
correspondence  further. 

In  practice  the  values  of  1^  and  L2  listed  in  table  (1)  are 
integer  valued  and  so  there  is  no  real  difference  between  the  approxima- 
tion, L2,  and  the  exact  value,  L . 
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Table  4-1 

Fixed  Packet  Length,  L 


X 

E(M) 

L° 

E(N°) 

L' 

E(N') 

A 

0.3 

11.8 

3.4 

3.846 

4 

3.353 

0.077 

0.2 

33.7 

6.64 

5.516 

7 

5.260 

0.012 

0.1 

166.0 

16.55 

10.508 

17 

10.244 

0.01 

0.09 

208.9 

18.76 

11.616 

19 

11.476 

0.006 

0.08 

269.6 

21.53 

13.006 

22 

12 . 739 

0.008 

0.07 

358.9 

25.09 

14.790 

26 

14.291 

0.04 

0.06 

497,8 

29.84 

17.171 

30 

17.082 

0.001 

0.05 

730.5 

36.40 

20.559 

37 

20.234 

0.008 

0.04 

1.2xl03 

46.49 

26.304 

47 

26.025 

0.05 

0.03 

2.1xl03 

63.14 

33.754 

64 

33.307 

0.02 

0.02 

4.8xl03 

96.47 

50.253 

97 

49.981 

0.05 

0.01 

1.96xl04 

196.5 

100.244 

197 

99.991 

0.001 

0.005 

7.93X104 

396.5 

200.500 

397 

200.250 

0.014 

Notes : 

E(M)  Message  length 
o 

L Packet  length  calculated  from  (B3)  and  (B4) ; the  optimum 
value . 

E(N  ) The  number  of  mean  packets  associated  with  packet  length, L° 
L'  Packet  length  calculated  from  approximation  (4.3.7),  and 
integer  rounded  for  practical  purposes. 

E(N')  The  mean  number  of  packets  associated  with  packet  length, L' 
A Numerical  difference  between  block  lengths  derived  from 
(4.3.4),  using  L°,E(N°)  for  E(B°)  and  L',E(N’)  for  E(B'), 
i'  = E(B')-E(B°) 


Second  moment  of  the  sum  of  two  dependent 
random  variables , M and  N 


A message  of  length  M bits  is  transmitted  in  N packets  where  both 
M and  N have  a geometric  probability  distribution  (see  Appendix  4a). 

The  sum  of  the  two  variables  forms  a new  random  variable,  C,  whose 
probability  mass  function  is  P (c)  and  characteristic  function,  C(s). 

V 

C * M + N (Cl) 


PC(C)S 


The  probability  mass  function,  Pc(c),  may  be  obtained  by  consid- 
ering (Al)  and  (Cl),  and  gives  an  expression  for  C(s)  as  follows: 


E E 

i*>l  m=(i-l)L+l 


V (m)s 

M 


Performing  the  double  summation  in  (C3) , the  characteristic  func- 
tion may  evaluated  into  the  expression: 

C(s)  - (C4) 

(1-as) (1-a  s X) 

The  first  moment  of  C is  obtained  by  taking  the  first  derivative  of 
C(s),  and  putting  (s=l) 


dC(s) 


1 1 

(1-a)  + L. 

(1-a  ) 
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E(C)  « E(M)  + E(N) 


(C5) 


The  second  moment  of  C is  obtained  by  talcing  the  second  derivative 
of  the  characteristic  function,  C(S) 


d2C(S) 

~T 


ds 


E(C  ) - E(C) 


8-1 


(C6) 


Consider  the  factorization  of  C(s)  into  two  conponents 
C(s)  - Q(s)s2 

where  the  derivatives  of  Q(s)  may  be  easily  calculated 


(C7) 


Q(s) 


(1-a) (1- (as)) 
(1-as)  (l-a^s*1*^) 


(C8) 


dQ(s) 


ds 


s-1 


(1-a) 


(l-aL) 


(C9) 


d2Q(s) 


ds 


2a 


2a 


2l 


s=l 


(1-a) ‘ 


(1-aV 


2 La 

(l-aL)2 


(2a)  (a**) 


(1-a)  (1-a  ) 

The  second  derivative  in  (CfS ) becomes 


(CIO) 


d2C(s) 


ds 


d Q(s) 

4d0(s) 

ds 

ds 

s=-l 

S=1 

„ 2 

„ 2L 

2a 

(1-a)2 

i l 

(l-aL)2 

+ 2C(1) 


s=»l 


(2a) (aL) 
(1-a) (1-a)1 


The  second  moment,  E(C"),  may  be  obtained  from  (Cll)  in  terms  of 
E(M)  and  E(N) , where  the  following  identities  are  required: 

E(M)  - (1-a) 

E(N)  - (l-a1*)”1 

E(M2)  - E2(M)  - E (M) 

2 2 

E(N  ) - E (N)  - E(N)  (as  in  App.  3a) 

The  second  derivative  at  (s=l)  becomes 

« 2E2(M)  + 2E2(N)  + 2E(M)E(M) 
s~l 

+ 2Lvar(N)  - 2E(M)  - 2E(M)  (C12) 

Combining  (C12)  with  (C6)  , the  second  it  ient  becomes: 

2 2 2 

E(C  ) - 2E  (M)  + 2E  (N)  + 2E(M)E(N)  - E (M)  - E(N)  + 2Lvar(N) 

(Cl  3) 

No  approximations  have  been  made  in  deriving  this  result. 
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Appendix  4d 


The  second  moment  off  the  sum  of  two 
dependent  random  variables,  M,  I 


The  results  associated  with  the  theory  of  recurrent  events10 
may  be  applied  to  the  flag  strategy  described  in  section  (4.9).  The 
recurrent  event  is  taken  here  to  be  the  possible  replication  of  the  flag 
pattern  within  the  message  data.  Upon  each  replication,  an  insertion  of 
an  extra  bit  is  made  into  the  message  data.  The  total  number  of  inser- 
tions, 1^,  in  a message  of  m bits  will  thus  correspond  to  the  number  of 

recurrent  events,  N . 

m 

Specifically,  by  observing  a source  which  produces  one  bit  of  data 
per  instant  of  time,  when  busy,  a recurrent  event,  E,  is  defined  to  occur 
at  the  (j+r)  ^ instant  if  the  message  sequence  between  the  j*11  and 
(j+r)th  instants  corresponds  to  the  first  r bits  of  the  flag  pattern. 

Before  considering  the  sum  of  message  length,  M,  and  number  of  in- 
sertions, I,  it  is  necessary  to  derive  the  probability  mass  function  of 
1^.  The  probability  distribution  of  message  length  is  assumed  to  be 
geometric,  with  mean  value  (1-a)  1.  The  distribution  function  for  I is 
related  to  the  conditional  distribution  for  I by 


P3  (i) 


Z 


m=l 


(i |m)P  (m) 

M 


(Dl) 
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The  probability  mass  function,  P 

1.U 

Let  the  0 instant  of  ti  coincident  with  the  occurrence  of 

an  event,  E,  during  the  busy  state  of  a source.  The  waiting  time  up  to 
the  next  recurrent  event,  E,  is  defined  by  a random  variable,  T^.  Sub- 
sequent waiting  times  between  adjacent  events  are  defined  by  T^,  etc. 

The  combined  waiting  time  from  the  zeroth  instant  to  the  r^1 
(it) 

event  is  defined  by  T , whose  value  is  the  sum  of  adjacert  waiting 
times,  each  assumed  to  be  independent; 

T(r)  - 11  + T2  + T3  + ...  Tr  (D2) 

till 

Two  probability  assignments  can  be  made  for  the  n time  instant, 

f and  u , where 
n n 


n > 0 


f 

n 


u 

n 


Pr  (event  E occurs  for  the  first  time  at  the 
nfc^  instant)  (D3) 

Pr  (event  occurs  at  n^  instant) 

0 ; Ug  ■ 1 (D4) 


Generating  functions  may  also  be  defined  for  f and  u. 


00 


(D5) 


(D6) 


A relationship  between  F(s)  and  U(s)  may  be  derived  by  considering 

til 

the  probability  of  an  event  E at  the  n instant. 
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u » f u , + f.u  + f,u  f u_ 

n 1 n-1  2 n-2  3 n-3  n 0 


which  transforms  into  the  s domain  easily,  by  considering  the 
right  hand  side  of  the  expression  as  a convolution. 


F(s) 


D(s)-1 

U{s) 


(D7) 


It  is  known  from  the  flag  strategy  that  the  probability  of  an 
X XT 

insertion  is  (—)  , which  is  the  same  as  u for  n > 

* n — 


u *• 
n 


(“)t  i n > r 

0 ; 0 < n < r 


(D8) 


The  characteristic  function,  U(s)  may  be  evaluated  using 
(D4,  6,  8); 


U(s) 


. . ,1. r r 

l-s+ ( j)  s 

(1-s) 


(D9) 


The  characteristic  function  of  F(s)  can  now  be  evaluated  from 
(D9,  7); 

-l.r  r 
( j)  s 

(DIO) 


F (s) 


l-s+(|-)rsr 


There  is  a simple  relationship  between  waiting  time,  T^,  and 

probability  of  a first  event,  f ; 

n 


Pr(T,  = n)  = f 
1 n 


If  a recurrent  event,  E,  occurs  for  the  second  time  at  the  n 

(2) 

instant,  a probability,  f , is  assigned,  where  from  (D2) ; 

n 


(Dll) 
th 


Pr(T, +T,  * T ■ n)  « f 
12  n 


(D12) 


til  til 

Similarly,  if  the  q event  occurs  at  the  n instant 


Pr(T(q)-n)  « f (q> 
n 


(D13) 


The  probability  assignment,  f , is  the  convolution  of  f^  with 

2 

itself;  this  suggests  that  its  characteristic  function  is  F (p) . The 

th 

result  extends  to  the  q case  above: 


f.f  + f,f  , + ...  f f 

1 n-1  2 n-2  n-1  1 


(2)  2 
Then  F (s)  * F (s) 

th  , (^j)  — ,£  , £ 

q case:  f c f *f  * •••  f 

^ n n n n 


(D14) 


F*q^ (s)  ■ Fq(s) 


£ f (q)sp 


(D15) 


These  results  may  now  be  used  to  obtain  P (i) . 

m 

The  probability  mass  function  may  be  written  as 

PT  (i)  = Pr(T(l+1)  > m)  - Pr(T(l)  > m) 
m 


(D16) 


Using  the  relationship  between  T^  and  f ^ in  (D13)  , (D16) 

m 


becomes 


? (i)  - 2 f(i>  " 2 fD(l" 


(D17) 


The  probability  mass  function,  P^i),  as  defined  in  (Dl)  car  be 
evaluated  using  (D17)  and  a geometric  distribution  for  m,  and  also  (D15) 


P (i) 


PT  (i)(l-a)a 
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Pr(i)  - 


FA(a) (1-F (a) ) 


1 - 


F(a) 


» i > 1 


t i - 0 


(D18) 


Moments  of  I,  Ed),  E(I  ) 

Having  derived  the  probability  mass  function  of  X,  a characteristic 
function  may  be  found,  using  (D18) j 


I(s)  *=  P (i)s1 

i-0 


I(s)  = 1 + 


F (a) (s-l) 
a(l-F(a)s) 


(D19) 


The  first  moment  is  obtained  from  the  first  derivative  of  I is), 
setting  s * 1 


E(I) 


dl(s) 


ds 


F (a) 


s — 1 


e(i)  = 


a(l-F(a) ) 


,l,r  r-1 
(2}  a 
(1-a) 


(D20) 


An  approximation  of  the  first  moment,  Ed),  can  be  made  using  the 
substitution  suggested  in  section  (4.11)  for  r 


(r  + 1)  = log2  (E(M)  ln2) 

where 

E (M)  = (1-a)"1 

then 

E(I)  = (1-1/E (M)  ) r_1  « 

or 

2 

2 21og2 (ln2E (M) ) 


ln2  ’ ln2  E (M) 


2 . E (M) (l-r/F(M) ) 
E(M)ln2 


(D21) 
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Terms  of  order  1/E(M)  and  lower  magnitudes  are  ignored.  The 
second  moment  of  I is  obtained  from  the  second  derivative  of  I(s), 


d2I(s) 


ds 


E(I2)  - E2(I) 


(D22) 


s«l 


E(I2)  . F(a) (ltf(a)) 
a (l-F(a) ) 2 


The  sum  of  two  dependent  variables,  M+I 

Having  obtained  PA  (i)  and  knowing  the  distribution  of  M,  P (m) , 
m M 

the  characteristic  function  of  a new  random  variable,  C,  defined  as  the 
sum  of  M and  I,  can  be  found. 

C = M + I 


pr(c)  = 5Z  PT  (c-m)P  (m)  ; c > 1 


m=l 


(D23) 


The  characteristic  function  of  C,  C(s),  is  defined  as 


C (s)  - £ P (c)sC 

c=l 


Substituting  Pc(c),  as  given  in  (D23) , into  (D24)  and 
the  summations, 

00  00 

C(s)  - £ PM(m)£m  5Z  PT  (n)sn 

m=l  — <■> 


(D24) 


rearranging 


n=0  m 


(D25) 


(D15) , 


Using  the  expression  for  P (i)  given  in  (D17)  and 

m 


using 
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F*(as)  » Y f ^ (as)n 
n*0 


The  summation  in  (D25)  may  be  simplified  to 


00 


C(s)  * Yj  PM(m)sm  53  (s0”1) 

m*l  n«l 


m 


E * 

p-i  1 


(n) 


m 


- E *, 

p-i  1 


(n+l) 


+ M(s) 


c(s)  - + M(s) 

(l-as)a(l-F(as) s) 


(D26) 


(D27) 


where  the  characteristic  function  of  M being  M(s) . The  first  moment  of 
C,E(C),  is  found  by  taking  the  first  derivative  of  C(s) 


E (C) 


dC  (s) 
ds 


F (a) 

a (1-F (a) ) 


+ E (M) 


1 s=l 


Using  (D20) , 


E (c)  = E (I)  + E(M) 


(D28) 


The  second  derivative  of  C(s)  gives  the  second  moment, 


d2C(s) 


ds 


Using  (D27) 


= E(C  ) - E (C) 


S=1 


(F (a)  ) 
(l-F(a) ) 


aF  (a) 


2 ( 1— a)  (1-F  <a) ) 


aF1 (a)  

(1-F (a) ) 2 _ 


+ M' ' (s) 


(D29) 


(D30) 


From  (DIO),  the  first  derivative. 


F' (a) , can  be  found 


Then 


F'(a) 

(l-r(a))2 


, , wl,r  r-1 

(r-1)  (j)  a 

ITT) 


<i)V 

(1-a)2 


- (r-l)E(I)  + E(I)E (M)  (d31) 

where  we  have  used  (D20) . 

Substituting  (D31)  back  into  (D30)  and  rearranging  to  obtain  Etc2), 
and  using  the  mean  value  of  E(C)  in  (D23) , 


Etc2)  - 2E2tM)  + 2E2(I)-E(M)-E(I)  + 2(r-l)E(I)  + 4E(I)E(M) 


-2E2 (I) 
E(M) 


(D32) 


No  approximations  have  been  made  in  obtaining  the  result  (D32) . However 

2 

in  section  4,  the  final  term  in  E (I)/E(M)  will  be  neglected  when  using 
this  result  because  it  is  of  order  less  than  unity  in  the  range 
102<  E (M) < 105. 
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Section  5 


Conclusions  for  Stop  Protocols 


5.1  Introduction 


A criterion  for  evaluating  protocol  strategies  in  terms  of  waiting 
time,  W,  in  the  source  node  queue  and  in  transmission  was  discussed  in 
section  (3),  where 


E(W) 


E(B)  + 


XE(B2) 

2 (1-XE (B) ) 


(5.1.1) 


In  the  fourth  section  first  and  second  moments  of  block  length,  B, 
were  derived  for  three  protocol  strategies;  the  fixed  packet,  flag  and 

Huffman  encoding  of  length  /chemes.  To  prove  that  the  waiting  time,  W , 

th  * 

for  the  i strategy  is  shorter  than  that  of  the  j**  strategy,  Wjf  it  is 

sufficient  to  show  that  the  two  conditions  are  met: 


E(Bi)  < E(Bj ) 


and 


E(B2) 


< E(B2) 


so  that  from  (5.1.1) 


W,  < W. 

i 1 


(5.1.2) 


(5.1.3) 


In  comparing  the  three  strategies  analyzed  in  section  4 condition 


(5.1.2)  may  be  employed  to  give  a simple  ordering  of  efficiencies. 


5.2  Comparing  the  first  moments,  E(P) 

The  first  moment  is  especially  important  in  the  analysis  cf  proto- 
col information,  because  it  relates  directly  to  the  entropy  of  the 
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information  (section  4.5).  in  comparing  expected  block  length  between 
strategies,  it  is  only  necessary  to  compare  average  protocol  data,  E(S), 
because  '(M)  is  common  to  all  schemes. 


E(B)  - E(M)  + E(S) 


(5.2.1) 


The  first  moments  of  block  length  were  found  in  section  4 to  have 
the  values:  (4.3.10),  (4.11.8),  (4.6.6), 

1 1 
Fixed  Packet:  E^)  - E (M)  + 1.43E2(M)  + log2(2E(M))? 

Fla9J  E(B2)  - E(M)  + log2(E(M)ln2)  + 3.886 

Huffman:  E(B3)  * E(M)  + log2 (E(M) ln2)  +3.0 


For  purpose  of  comparison,  it  is  convenient  to  introduce  the  common 
parameter,  r+1,  from  the  flag  strategy  (4.11.5),  providing  the  parametric 
equation  for  all  schemes 


E(S^)  = a^E  (M)  + b^r  + c^ 


where 


and 


r+I  =•  log2  (E(M)  ln2) 


E(M) 


ln2 


(5.2.2) 

(5.2.3) 

(5.2.4) 


Table  5.1  below  lists  the  coefficients  for  the  three  strategies, 
and  graph  5.1  plots  E(Sj  over  the  range  3 £ r < 13,  or  23  < E(M)  < 2.10^. 

Table  5.1  Mean  value  of  protocol  data 

1 

Fixed  packet:  E^)  = 1.4E7(m)  + 0.5r  + 1.26 

Fla9*  E(S2)  = 0 + r + 4.89 

Huffman:  E(S3)  - 0 + r + 4.00 


Pir3t  Moment  of  Block  Length 


(omitting  common  term,  B(M)J 
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Protocol  data,  E(S),  has  been  deriveo  from  E(B)  - E(M) , in  the 
equations  of  section  4.  Substitution  has  been  made  for  E(M)  according 
to  Equation  (5.2.4). 

From  the  graph,  and  teble  it  can  be  seen  that 

E(SX)  > E(S2)  > E(S3)  (5.2.5) 

E(B1)  > E(B,)  > E(B3)  (5.2.6) 

5.3  Discussion  of  first  moments 

1 

2 

The  fixed  packet  strategy  contains  a non  zero  E (M)  coefficient, 
which  dominates  Ete^)  for  large  values  of  E(M).  This  term  is  associated 
with  the  redundancy  of  the  final  packet,  which  is  eliminated  in  the  Huff- 
man scheme  by  relocating  the  length  specifier.  Similarly  in  the  flag 
strategy  there  is  no  equivalent  redundancy. 

The  Huffman  length  encoding  and  flag  strategies  are  remarkably 
close,  and  there  is  a simple  explanation  for  this  similarity.  The  Huff- 
mar*  scheme  employs  two  concatenated  code  words : one  a unary  encoding  of 
packets,  N,  and  the  second  a variable  length  codeword  expressing  final 
packet  redundancy.  The  second  codeword  has  the  same  expected  value  as 
the  flag  length  (5.2.3).  Each  insertion  has  probability  (j)r,  which  is 
equivalent  to  an  insertion  per  2r  bits,  on  average.  This  corresponds  to 
the  busy  bit  p..  eceding  each  packet  in  the  Huffman  code  (the  unary  code- 
word contains  N+l  bits) . 

1 he  Huffman  code  is  marginally  more  efficient  that  the  flag 


strategy  (by  0.89  bits),  but  both  schemes  vary  widely  from  the  fixed 
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packet  strategy. 

5.4  Comparing  second  moments,  E(B2) 

The  second  moment  gives  a meajure  of  the  dispersion  of  block  length 
about  its  mean  value,  and  influences  the  waiting  time  of  aessages  in  the 
source  queue  (see  5.1.1). 

Employing  the  parameter  r+1,  a general  equation  may  be  written  for 
the  second  moment,  E(B2); 

E(B2)  - 2e2(M)  + a1E3/2(M)  + (bir+ci)E(M) 

1 

* (d1r+.1)E2(M)  + flr2  + 9lr  + ^ (5.4.1) 

Each  strategy  has  a second  moment  defined  by  the  set  of  coeffi- 
cients (a,  b,  c,  d,  e,  f,  g,  h),  as  listed  in  Table  5.2  below.  Graph 
5.2  plots  E(B2)  over  the  range  of  r,  3 < r < 13.  The  term  in  E2(M)  is 
omitted  in  the  gr  phical  values,  being  common  to  each  strategy. 

Table  5.2  Second  moments  of  block  length 
Omitting  the  common  term  in  2e2(m)  from  E(B2),  the  second  maaonts 
for  the  fixed  packet,  flag  and  Huffman  schemes  are; 

**  2 3/2  — 

E Bl!  “ 4-3E  (M)  + (r+3) e(M) +(1.43r+2,  B0)r2 (m) +0,25r2+l, 16r+0.15 

E(62)  » 0 + (2r+14 .54) E (M)  * 0 + r2  . l5.S4r  + 23-55 

E(B2)  . 0 + (2r*9.77)  E(H)  * 0 + r2  + 8. Or  ♦ 16.0 

The  rerults  above  were  taken  from  (4.3.11),  (4,12.6)  and  (4.6.5) 


<J  Huffman 


Fixed  packet  & 


Graph  5 


Second  Moment  of  Block  Length 
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respectively,  where  log2E(M)  is  replaced  by  f(r)  defined  in  (5.2.3). 
Fran  the  graph,  and  coefficients  above,  it  can  be  seen  that 

E<B1>  > 2>  > E(B*)  (5.4.2) 

The  magnitude  of  first  moments  have  a similar  ordering  (5.2.6), 
such  that  condition  (5.1.2)  is  applicable  to  the  threw  strategies  under 
discussion.  The  order  of  magnitude  of  waiting  times  becomes 


W1  > W2  > W3 


(5.4.3) 


where  the  waiting  time  of  the  fixed  packet  strategy  is  w , that  of  the 

flag  strategy  is  W_,  and  the  Huffman  length  encoding  strategy  is  W . 

4 3 

5.5  Discus*-  of  second  moments 

The  redundancy  in  the  final  packet  of  the  fixed  packet  strategy 
3 /2 

coHtributes  a term  in  E (M)  which  is  not  present  in  the  other  strate— 

2 

gies.  This  term  causes  E(B^)  to  greatly  exceed  the  other  moments,  for 
larger  values  of  E (M) , 

The  flag  and  Huffman  encoding  strategies  contain  terms  of  similar 

order,  but  with  different  coefficient  values.  The  difference  between 
2 2 

E(B2)  and  EfBy,  for  large  values  of  E (M)  , becomes  4.77e(M)  which  increases 
as  an  exponent  of  r.  This  is  a more  significant  difference  than  0.89 
bits  in  the  first  moment,  and  illustrates  the  greater  uncertainty  of  the 
number  of  protocol  bits  in  the  flag  strategy. 


5.6  Conclusion 


In  the  context  of  a single  source/receiver  link,  three  protocol 
strategies  for  transmitting  start-stop  information  were  analyzed  for 
messages  with  geometrically  distributed  lengths.  The  mean  waiting  time 
of  messages  in  the  source  node  queue  and  transmission  was  taken  as  a per- 
formance measure  under  which  the  strategies  could  be  compared.  Queueing 
and  service  time  in  store  and  forward  networks  is  directly  related  to 
transmission  delay,  which  is  of  practical  importance  in  any  network. 

Three  strategies  were  taken  from  existing  networks,  including  fixed 
packet  , flag  and  variable  length  packet  strategies.  The  latter  was  based 
ideas  from  information  theory,  and  was  found  to  be  the  roost  efficient 
in  terms  of  queueing  delays,  and  average  codeword  length  for  the  protocol 
data. 

The  queueing  problem  was  simplified  by  assuming  geometric  message 
length  statistics,  which  although  not  generally  equivalent  to  practical 
cases,  do  exhibit  an  extremal  property.  Such  statistics  maximize  the 
amount  of  protocol  information  required  to  encode  message  length,  and  the 
most  efficient  encoding  for  this  case  satisfies  the  minimax  condition, 
i.e.,  the  most  economic  coding  under  the  worst  source  statistics. 

The  queueing  problem  was  analyzed  according  to  an  M/G/l  process, 
where  the  waiting  and  service  time  was  found  to  depend  only  on  first  and 
second  moments  of  block  length  (message  and  protocol  data) . The  first 
moment,  E(B),  also  has  significance  from  a source  coding  viewpoint.  An 
efficient  coding  scheme,  in  an  information  theoretic  sense,  is  one  that 
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ha  s an  average  codeword  length  close  to  the  source  entropy.  The  Huffman 
scheme  has  a mean  redundancy  above  the  source  entropy  of  0.03  bits/ 
message. 

The  first  moments  of  the  three  strategies  are  presented  in  table 
5.1.  Both  the  Huffman  and  flag  strategies  achieve  a coding  redundancy 
of  less  than  one  bit/message , and  have  some  close  similarities.  For 
instance , the  flag  closely  resembles  the  length  specifier  and  the  inser- 
tions (occurring  every  2r  bits,  on  average)  resemble  the  busy  bits 
placed  before  each  packet  of  length  L.  The  fixed  packet  strategy  is  less 
efficient  due  to  the  redundancy  in  the  final  packet,  which  is  eliminated 
in  the  Huffman  scheme. 

The  second  moments  are  listed  in  table  5.2.  The  variance  of  pro- 
tocol data  for  each  scheme  is  directly  related  to  second  moments.  The 
block  length  of  message  and  protocol  data  in  the  Huffman  case  has  a lower 
variance  than  under  the  *lag  strategy.  This  may  be  understood  by  con- 
sidering the  appearance  of  insertions  in  the  flag  strategy  in  contrast 
with  the  busy  bits  of  the  Huffman  scheme.  The  former  are  subject  to 
greater  statistical  uncertainty,  and  contribute  to  the  higher  overall 
variance  of  the  flag  strategy.  The  redundancy  in  the  final  packet  of  the 
fixed  packet  strategy  makes  the  second  moment  considerably  larger  than 
the  other  schemes. 

The  main  theoretical  result  to  emerge  from  the  study  is  the  close 
relationship  between  stop  information  and  message  length.  This  was  used 
to  advantage  by  devising  a protocol  strategy  using  an  efficient  encoding 


of  message  length.  The  source  coding  approach  was  shown  to  be  efficient 
in  a practical  sense  by  minimizing  queueing  and  transmission  delays  in 
data  networks,  as  well  as  reducing  average  protocol  data  to  a minimum 
level . 

Some  practica’  results  from  the  analysis  include  optimum  packet 
and  f!ag  lengths  for  the  appropriate  strategies.  For  example,  in  the 
fixed  packet  strategy,  packet  length  which  minimizes  protocol  data  was 
found  to  be  approximately  (2E<M))  , where  expected  message  length  is 
E(M) . In  the  huff  nan  scheme,  packet  size  was  E(M)ln2.  In  the  flag 
strategy,  optiir  n flag  length  was  ["log2E  (M)  ln2"| . 

Applying  these  results  to  the  IBM  line  protocol  described  in  sec- 
tion 4.10,  assuming  an  average  message  length  of  103  bits,  flag  length 
would  be  ten  bits.  Also  flag  structure  would  be  modified  by  omitting  the 
final  binary  zero,  i.e.,  0111111111. 

Having  completed  the  discussion  on  start-stop  protocol  for  a single 
source/receiver  pair,  attention  will  be  given  to  devising  a source  code 
for  start-stop  information  for  many  sources  and  receivers  sharing  a single 


channel. 


Section  6 


Protocol  for  a single  link  with  Identical  sources 
6.1  Introduction 

Previous  sections  have  been  concerned  with  protocol  for  a single 
link  with  one  source  and  receiver.  Protocol  was  necessary  to  specify 
the  beginning  and  end  of  each  message.  This  required  start-stop  infor- 
mation to  be  transmitted  together  with  the  message  data.  Discussion  is 
extended  here  to  a single  link  with  identical  sources  and  receivers. 
Naively  one  would  expect  that  in  addition  to  sturt-stop  information 
there  would  need  to  be  additional  data  conveying  destination  information 
to  the  receiver  node.  It  will  be  shown  that  start-stop  protocols  are 
sufficient  to  express  destination  as  well  as  the  beginning  and  end  of 
messages. 

A simple  model  of  a link  with  identical  sources  will  be  devel- 
oped. Two  coding  strategies  fo*-  protocol  information  will  be  presented 
and  compared:  the  Huffman  and  universal  coding  schemes.  Redundancy 
of  the  coding  schemes  over  source  entropy  will  be  considered  as  a per- 
formance measure  for  making  comparisons,  and  estimating  efficiency. 

One  consequence  of  this  section  will  be  to  demonstrate  that  addressing 
information  in  a data  network  can  be  avoided  by  selecting  the  appropriate 
encoding  of  protocol  information. 
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6.2  Model  of  a single  link  with  identical  sources 

A set  of  K Identical  independent  synchronous  sources  share  a 
binary  channel  to  a corresponding  number  of  receivers.  Each  source,  k, 
(1  k _<  K)  communicates  only  with  its  receiver,  k.  rhe  source  node 
resembles  a concentrator  which  serves  all  K sources  by  inspecting  their 
contents  at  each  L.stant  of  time  (see  Figure  6.1.) 

Each  source  can  exist  in  either  one  of  two  states:  the  idle  and 
busy  states.  Transitions  between  states  take  place  in  a synchronous 
manner  with  changing  time  instants.  Each  source  is  represented  by  a 
markov  process  (see  Figure  6.2),  where  the  probability  of  being  idle  is 

i rob . (idle)  « — ---v 

e + o 

and  the  probability  of  being  busy  is 
Prob.  (busy)  *=  — ---g 

When  idle,  the  source  delivers  idle  characters,  i,  and  when  busy 
it  delivers  binary  0's  and  l's,  corresponding  to  the  message  data. 

Information  relating  to  state  anc  message  data  in  a two  state 
markov  source  may  be  quantified  by  the  source  entropy,  H(S),  where 

H(S)  = £ + — ■£  Ti(e)  + — £ W( 5)  bits/unit  time 

and  ‘H(x)  = -xlog(x)  - (1-x)  log(l-x)  (6.2.1) 

The  entropy  of  the  source  provides  a lower  bound  on  the  average 
length  of  codewords  required  to  transmit  all  information  relating  to 


Figure  6.1 

Identical  synchronous  source/receiver  nairs 
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that  source/  including  state  and  message  data  (see  source  coding  theo- 
rem/ section  3).  For  K identical  independent  sources/  each  with  entropy 
H(S),  the  minimum  capacity  oi  the  binary  channel  must  be  greater  than 
the  total  entropy  of  the  combined  sources/  KH(S)/  in  order  that  reliable 
transmission  can  take  place. 

Hie  entropy  for  the  combined  sources/  KH(S)/  contains  three  terms 

t 

(see  equation  (6.2.1)).  The  first,  K —--y  is  the  average  message 
data  per  time  instant.  The  second,  K j +'  — H(e),  is  the  stop  informa- 
tion per  unit  time  for  all  the  sources,  and  the  third  is  the  start 

information,  K ; H(6). 

0 + e 

A meaningful  performance  measure  which  is  concerned  with  mini- 
mizing overhead  data  is  the  coding  redundancy  of  a protocol  over  the 
source  entropy.  Huffman  encoding  of  stop  information  has  been  shown 

to  be  efficient  in  this  sense,  and  will  be  used  again  for  start-stop 

12 

protocols.  A second  scheme  involving  universal  coding  will  also  be 
discussed  as  an  alternative  approach. 

6. 3 Huffman  encoding  of  start-stop  information 

Let  Kj. ( j and  Kg(j)  denote  the  number  of  sources  in  the  idle  and 
busy  states  respectively,  during  the  jth  time  instant.  The  total  num- 
ber of  source:,  is  K.  As  the  system  enters  the  j+1  time  instant,  a 
finite  number  of  sources,  q,  change  from  the  idle  to  busy  state 
(0  < q < Kj ( j ) ) , and  a finite  number,  p,  become  idle  after  completing 
a message,  (0  5 p £ K0(j)).  The  ncw  state  contains  KJ(j+l)  idle  and 
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busy  sources. 

If  initially  all  sources  ar.  sssuoed  idle,  and  both  the  aourco 
and  receiver  node,  wlntaln  a llst  of  ldl.  ,nd  bu,y  ,ources  ,t  >u 

tlrna  instants,  only  information  conveying  those  sources  in  transition 
need  be  transmitted  at  any  one  time  instant  to  update  the  list,,  n, 
start  (and  stop)  information  is  the  location  of  the  sources  in  the  idle 
(and  busy)  list  which  become  active  (inactive) . nu,  information  is 
sufficient  to  allow  the  receiver  to  update  lists,  and  allocate  the 

received  message  bit.  to  the  appropriate  receiver,  corresponding  to 
active  sources. 

For  a time  instant,  j,  a possible  format  for  transmitted  data 
could  be 


-(t-j-1)- 


start  info.  | stop  info.  | message  bits  |*-(t«*j+l) 

* <t«j) 


'J' 


Before  being  able  to  devise  a coding  scheme  for  the  start-stop 

protocol  information,  it  is  necessary  to  identify  the  structure  of  the 

information  source.  -This  may  be  done  for  the  start  information  by 

considering  those  idle  sources  which  become  active  during  one  time 

instant,  j.  They  are  chosen  randomly  out  of  the  list  of  idle  sources, 

KI(j“l)'  WHerC  the  f>roba*ility  of  q transition  is  given  by  the  binomial 
distribution 


Pr(q)  - 


a; 

(a-b) lb  I 


(j-1) 


(6)q 


(1-6)  1 


(j-D-q 


q 


(6.3.1) 


0 < q < Kjtj-l) 

A similar  distribution  applies  to  busy  sources  which  became  idle 


(p  of  them)  where 


/K^(  j-l)\  vj-u-p 

Pr(P)  1 <e)p  (l-e)  * 

0 iP  < Vj-U 


(6.3.2) 


The  set  of  all  possible  outcomes  for  transitions  in  either 
direction  may  be  enumerated  with  the  help  of  a coding  tree,  where  csach 
terminal  node  represents  a unique  outcome  (Figure  6.3).  To  construct 
a Huffman  code  for  such  a finite  tree  would  require  an  accurate  know- 
ledge of  E,  6,  together  with  much  confutation.6  The  coding  problem  may 
be  simplified  by  considering  the  distances  between  sources  in  transi- 
tion (i.e.,  run  length  coding),  in  the  idle  and  busy  lists. 

For  sources  in  the  idle  state,  the  probability  of  a transition 
is  6.  The  probability  distribution  of  the  distance,  d,  between  two 
transitions  is  geometric,  providing  that  the  list  is  infinitely  long, 
where 


Pr(d)  =6(1-6) ' 


and  1 < d < 00 


(6.3.3) 


The  condition  of  infinite  length  may  be  realized  by  concatenating 
the  lists  of  idle  sources  at  different  time  instants  into  one  infinitely 
long  list.  In  such  a case,  the  distance  between  transitions  is  des- 
cribed exactly  by  (6.3.3).  The  position  of  time  markers  corresponding 
to  the  division  between  lists  at  different  times  j-1,  j,  etc.,  can  be 


one  outcome 


V Q ) 

„ . KT-1 

6 0-8)  1 

—  KT-i  - 

80-  6)  1 

—  kt-i  - 

8(i-  6)  1 

6(i-  6)  1 


I outcome: 

n 


2 Kr~2 

6 0-6)  1 

~2  K.-2 

6(1-6)  1 

~2  K.-?- 

6 0-6)  1 

2 „ Kt-? 

60-6)  1 


KjOtj-.o/sr 

« 

. outcomes 


6 Ki 


one  outcome 


t ran  si  tiom 


probabi  i i tv 


6'»(i-6)Kr'1 


outcomes 


Pi  cure  6 . ? 


Codinr  tree  for  transitions  to  the  bu^v  state  (K  sources) 
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indicated  to  the  receiver  node  by  an  extra  bit  in  the  length  encoding 
of  those  distances  which  include  the  markers  (see  Figure  6.4). 

To  demonstrate  the  encoding  of  a distance  between  sources  in 
transition,  by  the  Huffman  coding  scheme,  consider  a typical  distance, 
d.  Define  a fixed  parameter  L which  depends  on  mean  distance  between 
transitions;  in  the  idle  list,  1/6.  From  (4.6.5) 

L-  (1/6)  In (2)  (6.3.4] 

Define  a second  integer  variable,  N,  such  that 


The  codeword  for  d is  constructed  in  two  parts.  The  first  is  a 
unary  code  of  N-l  binary  ones  followed  by  a binary  zero.  The  second 
is  a variable  length  encoding  of  [d]  Mod (L) , whose  mean  length  is 

log2(L),  resembling  the  coding  employed  in  section  4.6.  The  distance, 
d,  may  be  expressed  as 

d - (N- 1 ) L + [d] Mod (L)  (6.3.6) 

The  coding  of  distance  between  transitions  in  two  different 
lists,  i.e.,  at  times  j,  j+1,  introduces  an  undetermined  future  event. 
For  example,  when  coding  start  information  at  time  j,  no  information 
is  yet  available  about  transitions  at  time  j+1. 

If  the  last  source  to  become  active  in  the  list  of  idle  sources 
at  time  j is  sk(0  < k < K) , which  is  at  distance  d^  from  the  end  of  the 
list,  the  source  nod?  can  only  indicate  to  the  receiver  node  that  the 


source 


transition 


distance 


includes  time  marker 
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next  source  in  transition  is  beyond  the  last  source  in  the  idle  list, 


The  final  codeword  in  the  start  information  field  for  the  j1 


instant  of  time  will  consist  of 


r*  *1 

dk 

r 


binary  ones.  The  receiver 


will  count  NL  source  positions  dcwn  its  own  idle  list,  which  will  take  it 
beyond  the  final  element.  The  receiver  will  assume  that  all  start 
information  for  time  j is  complete,  and  will  look  for  stop  information. 

The  first  codeword  for  the  start  (or  stop)  information  in  the 
time  instant  records  the  position,  di|+1,  of  the  first  source  in 
transition  from  the  top  of  the  idle  (or  busy)  list  (see  Figure  6.5). 

The  redundancy  incurred  by  specifying  the  distance  between  the  last 
source  at  time  j and  the  first  source  at  time  j+1  as  two  codewords  in- 
stead of  one  is  found  in  Appendix  6 to  be  0.614  bits.  The  distances 
d^  1 and  d^  still  have  a geometric  distribution  owing  to  the  special 
property  of  the  source  statistics,  i.e.,  regardless  of  where  one  starts 
to  count  to  the  next  source  in  transition,  the  distance,  d,  remains 
geometrically  distributed. 

The  encoding  of  transitions  in  the  busy  source  list  proceeds  in 
the  same  manner,  except  that  parameter  L is  redefined  as 


L - (1/G) In  (2) 


(6.3.6) 


On  receiving  all  start-stc^  information,  the  receiver  can  compute 
the  number  of  message  bits  originating  at  the  active  source  at  the  par- 
ticular instant  of  time.  These  bits  are  transmitted  in  the  same  order 
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Length  encoding  between  lists t t=iti4-l 
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as  the  sources  appear  in  the  busy  list  so  that  they  may  b?  routed  to 
the  appropriate  receiver  pairs/  in  corresponding  positions  in  the 
receiver  node  list. 

In  the  case  of  no  idle  (or  busy)  transitions  in  one  time  instant, 
an  appropriate  n'imber  of  binary  ones  are  sent  to  indicate  that  the 
distance  between  sources  in  transition  exceeds  the  idle  (or  busy)  list 
size. 

6.4  Performance  of  the  Huffman  coding  scheme 

In  order  to  reduce  cost  in  a network,  i.e.,  minimize  channel 
capacity  (allocated  on  a unit  cost  basis)  , it  is  necessary  to  design 
protocol  strategies  with  small  coding  redundancy.  A further  objective 
of  practical  importance  is  to  design  protocols  which  minimize  trans- 
mission delays,  including  queueing  time  at  nodes.  The  single  source/ 
receiver  link  illustrated  how  queueing  delays  are  related  to  the  ex- 
pected value  of  overhead  data  as  well  as  the  second  moment. 

Hie  arrival  of  message  bits  at  the  source  node  of  a link  with 
K identical  sources  does  present  a queueing  problem  if  the  number  of 
sources  is  small,  and  the  channel  capacity  only  sufficient  to  transmit 
an  average  message  load.  In  order  to  eliminate  second  moments  of  pro- 
tocol data  fren  this  discussion,  it  has  been  assumed  that  vhe  value 
of  K is  large  enough  to  allow  the  statistical  law  of  large  numbers  to 
operate,  ensuring  that  the  load  at  all  times  corresponds  with  channel 
capacity.  This  assumption  implies  that  a protocol  strategy  which 


minimizes  expected  overhead  data  is  to  be  considered  the  most  effi- 
cient scheme  for  conveying  protocol  information. 

It  1 3 convenient  to  analyze  the  expected  overhead  or  protocol 
data  associated  with  a single  message  delivered  by  one  of  the  K 
identical  sources.  This  will  allow  us  to  conpare  the  coding  redundancy 
of  protocol  information  for  the  single  aource/receiver  model  with  the 
link  with  identical  sources  model. 

Ihe  entropy  of  a single  message  from  a markov  source,  with  mean 
length  1/e,  and  idle  period  1/6,  is  H(S), 

H (s’  - 1/e  H(e)  + 1/6  H(6)  + 1/e  (6.4.1) 

The  first  term  represents  message  length,  or  stop  information, 
the  second  represents  idle  or  start  information,  and  the  third  in 
message  content.  One  bit  of  message  data  is  delivered  by  the  source, 
when  busy,  each  instant  of  time. 

The  statistics  of  a single  source  over  many  time  instants  are 
identical  to  those  of  a chain  of  markov  sources  at  one  time  instant. 

For  example,  consider  the  list  of  busy  sources  at  time  j.  The  length 
between  two  sources  in  transition  ir,  the  list  has  the  same  geometric 
distribution  a;  the  busy  period  of  a single  markov  source  over  many 
instants  of  time.  The  mean  distance  between  transitions,  and  >-usy 
period  length  :s  1/C  . The  Huffman  coding  of  stop  information  in 
earlier  sections  was  for  a single  source  over  many  tx.ue  instants.  Here 
the  encoding  is  performed  for  sources  in  transition  in  the  busy  list  at 
one  time  instant  (see  Figure  6.6). 


Sources  T-*  me ( 1 ) 'Pj.me ( t1  +1 ) 


y — '>Qri 


transition 


transition 


S ( r+1 ) S(r+?)  S(r+3)  S(r+4)  S(r+5)  S(r+6)  S(r+7) 


S i ri /t-[ c r-.ource  over  manv  time  instants 
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f 


I 


The  coding  described  in  Section  6.3  achieves  a redundancy  of  mean 
value  about  0.03  bits  (this  fluctuates  between  0.02  and  0.04  bits  de- 
pending on  expected  message  length)  above  the  source  entropy,  for  each 

7 

encoded  distance  between  transitions  , excluding  the  extra  0.614  biti 
required  to  indicate  the  time  marker  (see  Appendix  6) . 

The  redundancy  incurred  by  specifying  the  end  of  the  buoy  and  idle 
lists  amounts  to  1.288  bits  (on  average)  per  instant  of  time.  Over  a 
long  time  period,  the  average  number  of  messages  per  instant  averages  out 
to  be  e6K/(e  + 6)  , where  the  length  of  a message  and  preceding  idle 
period  is  1/e  + 1/6.  The  weak  law  of  large  numbers  gives  an  average 
bit  redundancy  per  message,  R,  of 


R = 0.06  + 1.288 


(C  + 6) 

e6x 


(6.4.2) 


Define  n as  the  ratio  of  the  coding  redundancy,  R,  to  the  length 
of  a message,  1/e.  Then 


__  ~ _ . 1-288  . 1.288  e 

n = cr  = o.06e  + — - — + 


K 


K 6 


(6.4.3) 


Consider  the  case  for  large  values  of  K,  and  infrequent  message 
arrivals,  such  that  the  idle  period  is  much  greatei  than  the  busy 
period,  1/6  » 1/c.  In  the  range  ] <_  5k  100,  the  approximation  for 
H car  be  made,  where 

,288 


and 


/l. 288  \ 

n * + °-°rv 

6Kj  ~ 6k 


(6.4.4) 


Graph  6.1  illustrates  how  the  coding  redundancy  ratio  depends  on 
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message  length,  1/e,  and  number  of  transitions,  per  time  instant  for 
both  start  and  stop  protocols.  The  effect  of  the  time  marker  on  a 
message  becomes  negligible  in  the  upper  range  ofJK  transitions.  In 
•ucn  a case,  the  redundancy  of  protocol  encoding  corresponds  to  that  of 
the  single  source/receiver  model,  i.e.,  0.06  bits  on  average  per  message. 

6.5  A universal  coding  approach 

A second  approach  to  the  coding  of  start-stop  information  can  b« 

12 

made  using  a universal  coding  scheme  . Again  lists  of  idle  and  busy 
sources  are  maintained  at  each  node,  and  updated  each  instant  of  time. 

The  coding  of  source  transitions  in  the  two  lists  may  best  be  explained 
by  an  exanple. 

Consider  a list  of  elever  'die  sources  at  time  j-1.  Two  sources 
become  active  before  the  jth  time  instant,  say  at  the  second  and  fifth 
positions  in  the  list.  A binary  word  is  constructed  to  represent  this 
change,  where  sources  vhich  remain  idle  are  represented  by  a binary  zero, 
and  those  which  become  active  by  a binary  one.  The  information  for  this 
event  would  then  by  01001000000  where  the  first  bit  of  the  word  refers 
to  tie  first  source  in  the  list,  etc. 

The  uni  versa]  code  proceeds  by  transmitting  the  number  of  transi- 
tions, in  this  case  two,  as  a run  length  code  word,  i.e.,  110.  Having 
conveyed  the  number  of  transitions,  the  possible  number  of  eleven  bit 
binary  words  are  reduced  from  2 ^ to  = **5.  Eac^  outcome  is  equally 

likely,  and  may  be  encoded  by  a fixed  length  word  of  ("log^  (55) ] = 6 b ts. 
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Ihe  receiver  has  a decoding  table  for  each  of  the  possible  transitions. 
Listed  below  are  some  possible  codewords  for  two  transitions  in  an 
eleven  source  list: 


List 

11000000000 

10100000000 

10010000000 

10001000000 


Codeword  (6  bits) 
000000 
000001 
000010 
000011  etc. 


01001000000  001110 

The  codeword  for  01001000  is  the  concatenation  of  110  (two  transitions) , 

and  001110  (location  of  the  transitions) . 

The  universal  scheme  is  most  efficient  for  small  numbers  of 

transitions.  It  has  one  advantage  over  the  Huffman  scheme  in  that  it 

is  constructed  for  a finite  coding  tree,  and  does  not  need  to  specify 

time  mar’.-ro  between  lists.  Inspecting  the  coding  tree  in  Figure  6.3, 

it  is  seen  that  the  universal  scheme,  by  specifying  the  number  of  transi 

/KI\ 

tions,  reduces  the  tree  to  one  branch  with  ^ J equally  likely  outcomer. 
Kj  is  the  number  of  sources  in  the  list,  and  q the  number  of  transitions 

6 . 6 Performance  of  the  universal  code 

Assuming  that  the  lists,  and  K^,  are  very  long,  and  the  number 
of  transitions  are  large,  sterlings  approximation  may  be  employed  to 
give  an  estimate  of  the  average  codeword  length  for  start  and  stop 
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mtM 


information,  nJ#  nB>  The  distribution  of  transitions  in  the  lists  brj* 
been  given  in  (6.3.1)  and  (6.3.2).  The  length  of  the  run  length  code- 
word conveying  the  number  of  transitions  is  q + 1.  The  codeword 
conveying  position  is  log2[(^1)]. 


nx  m E(q+1)  + e 


(KC1)]) 


(6.5.1) 


Sterling's  approximation  gives 

lo^  C)  ~ KI  + i 

For  sufficiently  large  1^,  q/K^.  « 6,  so  that 


nx  w 6kx  + 1 + k 


(6.5.2) 


To  evaluate  the  number  of  protocol  bits  per  message  required  to 
transmit  start-stop  information,  nj  represents  the  jointly  encoded  start 
information  for  filCj.  transitions.  For  each  transition. 


* 1 + 1//l5  H(S)  (bits/message);  6^  » 1 (6.5.3) 

The  second  term  is  the  entropy  of  the  start  information,  inferring 
that  one  bit  of  redundant  code  per  message  is  required  for  start  infor- 
mation. Similarly,  one  bit  of  redundancy  occurs  in  the  stop  information 

giving  a total  of  two  bits  of  redundant  protocol  data  per  message  in 
the  universal  coding  scheme. 


negligible  for  large  values  of  K (see  Graph  6.1). 

Hie  universal  scheme  is  less  efficient  when  several  transitions 
take  place  per  time  instant,  with  a bit  redundancy  per  message  of  two. 

It  is  also  considerably  less  practical  due  to  the  large  number  of  de- 
coding tables  which  must  be  kept  at  the  receiver  node  to  identify  the 
position  of  source  transitions.  The  Huffman  scheme,  a run  length 
coding  technique,  is  simple  to  implement,  and  efficient  in  terms  of 
average  length  of  codewords . 

6.8  System  design  implications 

A concentrator,  as  incorporated  into  our  simple  K source/ 
receiver  network,  is  an  efficient  means  of  allocating  channel  bandwidth 
to  many  users.  It  avoids  the  need  for  address  information,  as  do  time 
division  multiplexing  systems,  but  also  allocates  channel  space  dynami- 
cally thus  ensuring  no  empty  time  slots. 

Sircple  time  division  multiplexing  systems  provide  regular  time 
slots  for  all  sources  sharing  a link.  If  one  or  more  “rurces  ar  ic>.se, 
the  time  slot  will  be  empty.  The  concentrator  keep  j list  of  ell 
active  sources,  and  allocates  one  time  slot  to  eccn  of  tnese  per  time 
instant.  No  channel  space  is  allocated  to  idle  so  The  statistical 

allocation  of  bandwidth  by  a concentrator  Is  a?  efficient  a ' -ansmit'-ing 
messages  in  a single  queue,  but  in  addition  el.irlr.rtes  jldress  infortta- 
tion  which  would  be  necessary  under  a single  server  approach  (see 
Figure  6.7) . 
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Appendix  6 

Encoding  distance  between  sources  at  time  instants  j,  j+1 

Consider  the  start  information  for  time  instant  j.  In  the  list 
of  idle  sources,  the  last  source  to  become  active  is  distance  d^  from 
the  end  of  the  list.  Ihere  may  be  no  transition  at  time  j in  which 
case,  d,^  is  the  total  number  of  sources  in  the  list.  For  positive 

^ fdlkl  i 

integers  ■ — j—  and  R^  (0  _<  < L) , the  cc  '■ing  of  d£  is  performed 

according  to  the  procedure  of  Section  6.3  where  L » 1/6  ln2  and 

dj|  - (Nx  - l)t  + Rj_  (6a) 

•j 

Ihe  codeword  for  d£  consists  of  binary  ones,  which  indicates 
to  the  receiver  that  the  next  transition  is  distance  (N^)L  sources  from 
the  previous  one,  which  will  be  beyond  the  final  element  of  the  idle 
li"t  at  time  instant  j.  In  the  next  time  instant,  j+1,  the  first  code- 
word of  the  start  information  will  give  the  position  of  the  first 

j+1 

source  in  transition  from  tl  j top  of  the  idle  list,  d^  . If  there  are 
no  sources  in  transition  in  this  list,  then  a similar  procedure  used 
for  the  final  codeword  at  time  j is  adopted.  For  positive  integers 
and  R^,  where  0 £ < L,  listance 

dl+1  " (N2  ” 1)L  + *2  (6b) 

1+1 

The  coding  of  d-  will  consist  of  - 1 binary  ones  followed 
by  a zero,  and  a variable  length  encoding  of  R^i  average  length  log^(L). 
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■ ” 


In  the  case  of  no  transitions*  ones  are  sent  alone.  Hie  number  of 
sources  between  the  two  sources  in  transition  (in  lists  t-j,  j+1)  is  d. 


d-d ■[  + dj+1  - (Na  + N2-2)L  +1^  + 1^ 


(see  Figure  6.5). 

Hie  mean  lengths  of  the  codewords  for  d^  and  d^+lt  together  exceed 
the  mean  length  of  d,  as  coded  directly  by  the  Huffman  scheme*  by  a 
fraction  of  a bit*  x.  This  fracti >n  is  the  redundancy  caused  by  the 
additional  specification  of  a new  list  at  time  j+1  in  addition  to  the 
distance  to  the  next  source  in  transition,  d. 

Having  defined  the  procedure  for  encoding  distances  dj|  and  d^+*, 
it  is  now  possible  to  calculate  x by  comparing  the  mean  lengths  of  the 
two  codewords  to  that  of  d (coded  as  an  integer  by  the  Huffman  scheme) . 
Consider  the  event  A which  occurs  when  R^  + R2  < L»  the  codeword  for  d 
has  length  n ,*  where 

U 

nd  * N1  + N2  ” 1 + 1o92l  (6d) 

A second  event,  B,  occurs  when  Rj,  + ^ — L'  such  that  length  of  t he 
codeword  for  d becomes 


r‘d  = N!  + N2  + 1o92l 


The  confined  lengths  of  the  two  codewords  for  dj^  and  d^+^ 


n,^  and  n.^,  where 
d d 


ndA  + n2d  - <V  + (N2  + log2L)  (6f ) 

Only  during  event  A is  the  coding  of  d performed  more  efficiently 
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by  one  codeword  rather  than  two,  comparing  (6f)  with  (6d,  6e)  . In 
event  A,  a single  bit  must  be  included  in  ehe  start  information  to 
indicate  the  time  division  between  lists.  Thus  the  mean  value  of  the 
bit  redundancy  is  x,  where 

x « l.Pr  (event  A)  + O.Pr  (event  B) 

■ Pr  (event  A) 

- Pr  (Rx  + < L)  (6g) 


The  integers  and  are  random  variables  with  distributions 
of  the  form 


PR(r) 


6(1-6) 


r-1 


1— (1—6)  ‘ 


(6h) 


where 


0 < r < L 


The  two  variables  are  statistically  independent.  As 


l-Pr(B) 


L-l 

i-  E 


V° 


PRi  (r^L-r^tr,) 


Pr  (A) 


1 
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(6i) 


Evaluating  the  previous  expression, 

Pr (A)  - 1-  6l(1-6)L“2  * (l-6)Llj 

(1-(1-6)V  l-(l-fi)1 

For  large  values  of  L.  ie:  long  idle  periods,  the  approximation  below 
can  be  made, 

(1-6) L Cr  1/2 

where  L - (1/6) In (2) 

The  probability  of  event  A,  for  long  idle  periods  (1/6  » 1) , becomes 

x - Pr  (A)  - 0.614  bits  (6^ 

In  the  case  where  no  transitions  occur  at  time  j,  the  analysis  remain! 
unchanged,  although  covers  the  entire  list  of  ^(J)  elements  rather 

than  the  final  part. 


Section  7 


The  Conclusion 

7.1  Summarizing  the  network  protocol  problem 

Three  categories  of  protocol  information  have  been  discussed  at 
length  here  in  order  to  illustrate  a design  procedure  based  on  source 
coding.  These  include  the  beginning,  end,  and  destination  of  messages. 
The  objective  in  each  case  was  to  find  a protocol  strategy  which  mini- 
mizes the  average  control  data  which  accompanies  messages  during  trans- 
mit ion.  In  so  doing,  channel  bandwidth  requirements  and  transmission 
delays  are  minimized,  leading  to  a more  efficient  useage  of  network 
resources. 

7-2  The  design  of  efficient  network  protocols 

Protocol  information  is  necessary  in  a communications  network  to 
resolve  the  statistical  uncertainties  associated  vith  incoming  messages, 
including  arrival  time,  length  and  destination.  These  uncertainties  may 
be  modelled  by  separate  information  sources  from  those  supplying  message 
data.  The  design  approach  adopted  here  was  to  find  efficient  sour--  i 
encodings  which  met  the  lower  bounds  already  constructed  for  some 
protocols4. 

The  availability  of  reliable  statistics  of  network  users  is 
essential  in  order  to  achieve  efficient  soui.ee  codes.  For  purposes  of 
illustration,  some  standard  distributions  have  been  assumed  including 
geometrically  distributed  message  lengths  and  poisson  arrivals. 


-94- 


-95- 


7.3  Start-s top  protocols 

The  start  stop  protocols  associated  with  message  arrival  and 
length  information  are  found  to  be  related  to  the  lengths  of  idle  and 
busy  periods  of  the  information  source.  A Huffman  coding  scheme  is 
available  to  encode  efficiently  the  geometrically  distributed  integers 
corresponding  to  these  periods.  The  scheme  is  discussed  both  for 
single  source  and  receiver  links  (Sections  3,4,5)  and  many  identical 
source/receiver  pairs  sharing  a single  link  (Section  6). 

For  the  single  source/receiver  model,  the  condition  variable 
length  packets  was  found  necessary  to  achieve  a small  coding  redundancy 
of  message  length,  or  stop  information.  Such  a condition  may  be  met 
by  using  either  a terminating  flag  character  or  a Huffman  length  en- 
coding (with  slight  advantage  to  the  latter  scheme).  Section  5.6 
summarizes  the  relative  performance  of  a fiyed  packet,  terminal  flag 
and  Huffman  length  encoding  approach  to  stop  information  for  single 
source/receiver  models. 

By  employing  an  information  concentrator  in  a network  with  iden- 
tical sources  sharing  a single  link,  the  need  for  separate  address  infor- 
matio"  was  eliminated.  It  was  found  sufficier*  to  provide  the  receiver 
node  with  start-stop  information  for  each  soiuue/recei  ver  pair  individ- 
ually in  order  to  communicate  messages  to  the  appropriate  destinations. 
Again  a Huffman  length  encoding  scheme  efficiently  conveyed  the  start- 
stop  information,  which  contained  the  lengths  of  the  idle  and  busy 
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periods.  Section  6.7  summarizes  the  efficiency  of  this  approach,  and 
section  6.8  comments  on  design  implications. 

7.4  Problems  of  future  interest 

The  problem  of  transmission  delay  is  of  major  practical  impor- 
tance in  communication  networks.  A source  coding  approach  to  protocol 
information  is  able  to  reduce  network  overheads  to  a minimum,  but 
sometimes  at  the  expense  of  increased  delay.  For  example,  it  is  more 
efficient  to  perform  joint  encodings  of  message  lengths  and  arrivals 
than  to  send  individual  protocol  information  for  each  message  to  be 
transmitted.  However,  joint  encoding  assumes  that  one  waits  for  several 
rassages  to  arrive  in  a queue  before  commencing  transmission.  The 
relationship  between  delay  and  efficient  protocol  coding  is  yet  to  be 
explored  from  a practical  standpoint. 

The  second  major  issue  in  network  protocols  is  the  routing  and 
supervisory  information.  No  bounds  yet  exist  for  such  information  with 
which  to  evaluate  existing  protocols,  and  devise  more  efficient  ones. 

The  problem  is  complicated  by  the  intimate  relationship  between  effec- 
tive control  and  state  information  in  a network.  By  supplying  addi- 
tional information  on  traffic  flow  conditions,  it  is  generally  possible 
to  improve  routing  of  messages,  and  thus  utilise  network  capacity  more 
efficiently.  However,  the  control  information  itself  reduces  network 
capacity,  and  message  flow.  This  field  will  require  a joint  control 
and  communications  approach. 
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The  rapid  expansion  of  data  networks  in  the  near  future  should 
provide  the  economic  incentives  to  improve  the  efficiency  of  overhead 
information.  The  study  of  protocol  structure  aund  implementation  could 
offer  significant  savings  in  system  overheads. 
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